Thread: Speeding up pg_upgrade

Speeding up pg_upgrade

From
Bruce Momjian
Date:
As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
zero-downtime upgrades.  After the usual discussion of using logical
replication, Slony, and perhaps having the server be able to read old
and new system catalogs, we discussed speeding up pg_upgrade.

There are clusters that take a long time to dump the schema from the old
cluster and recreate it in the new cluster.  One idea of speeding up
pg_upgrade would be to allow pg_upgrade to be run in two stages:

1.  prevent system catalog changes while the old cluster is running, and
dump the old cluster's schema and restore it in the new cluster

2.  shut down the old cluster and copy/link the data files

My question is whether the schema dump/restore is time-consuming enough
to warrant this optional more complex API, and whether people would
support adding a server setting that prevented all system table changes?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  After the usual discussion of using logical
> replication, Slony, and perhaps having the server be able to read old
> and new system catalogs, we discussed speeding up pg_upgrade.

Sounds familiar.

> There are clusters that take a long time to dump the schema from the old
> cluster and recreate it in the new cluster.  One idea of speeding up
> pg_upgrade would be to allow pg_upgrade to be run in two stages:
>
> 1.  prevent system catalog changes while the old cluster is running, and
> dump the old cluster's schema and restore it in the new cluster
>
> 2.  shut down the old cluster and copy/link the data files

Perhaps a bit more complicated, but couldn't we copy/link while the
old cluster is online and in backup mode, finish backup mode, shut down
the old cluster, and then play forward the WAL to catch any relation
extents being added or similar, and then flip to the new PG version?

> My question is whether the schema dump/restore is time-consuming enough
> to warrant this optional more complex API, and whether people would
> support adding a server setting that prevented all system table changes?

When you say 'system table changes', you're referring to basically all
DDL, right?  Just wish to clarify as there might be some confusion
between the terminology you're using here and allow_system_table_mods.

Would we need to have autovacuum shut down too..?

The other concern is if there's changes made to the catalogs by non-DDL
activity that needs to be addressed too (logical replication?); nothing
definite springs to mind off-hand for me, but perhaps others will think
of things.

Thanks!

Stephen

Re: Speeding up pg_upgrade

From
Dave Page
Date:
Hi

On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote:
As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
zero-downtime upgrades.  After the usual discussion of using logical
replication, Slony, and perhaps having the server be able to read old
and new system catalogs, we discussed speeding up pg_upgrade.

There are clusters that take a long time to dump the schema from the old
cluster and recreate it in the new cluster.  One idea of speeding up
pg_upgrade would be to allow pg_upgrade to be run in two stages:

1.  prevent system catalog changes while the old cluster is running, and
dump the old cluster's schema and restore it in the new cluster

2.  shut down the old cluster and copy/link the data files

When we were discussing this, I was thinking that the linking could be done in phase 1 too, as that's potentially slow on a very large schema.
 

My question is whether the schema dump/restore is time-consuming enough
to warrant this optional more complex API, and whether people would
support adding a server setting that prevented all system table changes?

I've certainly heard of cases where pg_upgrade takes significant amounts of time to run on very complex databases. 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Tue, Dec  5, 2017 at 09:16:02AM -0500, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades.  After the usual discussion of using logical
> > replication, Slony, and perhaps having the server be able to read old
> > and new system catalogs, we discussed speeding up pg_upgrade.
> 
> Sounds familiar.

Yeah.  :-|

> > There are clusters that take a long time to dump the schema from the old
> > cluster and recreate it in the new cluster.  One idea of speeding up
> > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> > 
> > 1.  prevent system catalog changes while the old cluster is running, and
> > dump the old cluster's schema and restore it in the new cluster
> > 
> > 2.  shut down the old cluster and copy/link the data files
> 
> Perhaps a bit more complicated, but couldn't we copy/link while the
> old cluster is online and in backup mode, finish backup mode, shut down
> the old cluster, and then play forward the WAL to catch any relation
> extents being added or similar, and then flip to the new PG version?

Well, that would require reading the old WAL, which would add an
additional compibility requirement that seems unwise.

> > My question is whether the schema dump/restore is time-consuming enough
> > to warrant this optional more complex API, and whether people would
> > support adding a server setting that prevented all system table changes?
> 
> When you say 'system table changes', you're referring to basically all
> DDL, right?  Just wish to clarify as there might be some confusion
> between the terminology you're using here and allow_system_table_mods.

Not only all DDL, but even updating them for the internal stuff, like
pg_class.relfrozenxid.

> Would we need to have autovacuum shut down too..?

Yes.

> The other concern is if there's changes made to the catalogs by non-DDL
> activity that needs to be addressed too (logical replication?); nothing
> definite springs to mind off-hand for me, but perhaps others will think
> of things.

Yes, it could extend to many parts of the system, which is why I am
asking if it is worth it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Tue, Dec  5, 2017 at 11:16:26PM +0900, Dave Page wrote:
> Hi
> 
> On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote:
> 
>     As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
>     zero-downtime upgrades.  After the usual discussion of using logical
>     replication, Slony, and perhaps having the server be able to read old
>     and new system catalogs, we discussed speeding up pg_upgrade.
> 
>     There are clusters that take a long time to dump the schema from the old
>     cluster and recreate it in the new cluster.  One idea of speeding up
>     pg_upgrade would be to allow pg_upgrade to be run in two stages:
> 
>     1.  prevent system catalog changes while the old cluster is running, and
>     dump the old cluster's schema and restore it in the new cluster
> 
>     2.  shut down the old cluster and copy/link the data files
> 
> 
> When we were discussing this, I was thinking that the linking could be done in
> phase 1 too, as that's potentially slow on a very large schema.

Uh, good point!  You can create the hard links while system system is
running, no problem!  It would only be copy that can't be done while the
system is running.  Of course a big question is whether hard linking
takes any measurable time.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Dave,

* Dave Page (dpage@pgadmin.org) wrote:
> On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades.  After the usual discussion of using logical
> > replication, Slony, and perhaps having the server be able to read old
> > and new system catalogs, we discussed speeding up pg_upgrade.
> >
> > There are clusters that take a long time to dump the schema from the old
> > cluster and recreate it in the new cluster.  One idea of speeding up
> > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> >
> > 1.  prevent system catalog changes while the old cluster is running, and
> > dump the old cluster's schema and restore it in the new cluster
> >
> > 2.  shut down the old cluster and copy/link the data files
>
> When we were discussing this, I was thinking that the linking could be done
> in phase 1 too, as that's potentially slow on a very large schema.

Right, I had that thought too when first reading this, but the problem
there is that new files can show up due to a relation being extended (at
least, and perhaps in other cases too..).

> > My question is whether the schema dump/restore is time-consuming enough
> > to warrant this optional more complex API, and whether people would
> > support adding a server setting that prevented all system table changes?
>
> I've certainly heard of cases where pg_upgrade takes significant amounts of
> time to run on very complex databases.

Right, but that doesn't really answer the question as to which part of
the pg_upgrade process is taking up the time.

In any case, of course, if we're able to move part of what pg_upgrade
does to be while the old server is online then that takes whatever the
cost of that is out of the downtime window.  The question is if that's a
5% improvement in the overall performance of pg_upgrade or a 70% one.
This will be case-by-case, of course, but if, in the best-case, we only
get a 5% improvement then this might not be worth the risk.

Thanks!

Stephen

Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Tue, Dec  5, 2017 at 09:23:49AM -0500, Stephen Frost wrote:
> Dave,
> 
> * Dave Page (dpage@pgadmin.org) wrote:
> > On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > > zero-downtime upgrades.  After the usual discussion of using logical
> > > replication, Slony, and perhaps having the server be able to read old
> > > and new system catalogs, we discussed speeding up pg_upgrade.
> > >
> > > There are clusters that take a long time to dump the schema from the old
> > > cluster and recreate it in the new cluster.  One idea of speeding up
> > > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> > >
> > > 1.  prevent system catalog changes while the old cluster is running, and
> > > dump the old cluster's schema and restore it in the new cluster
> > >
> > > 2.  shut down the old cluster and copy/link the data files
> > 
> > When we were discussing this, I was thinking that the linking could be done
> > in phase 1 too, as that's potentially slow on a very large schema.
> 
> Right, I had that thought too when first reading this, but the problem
> there is that new files can show up due to a relation being extended (at
> least, and perhaps in other cases too..).

Oh, yikes, yes.

> > > My question is whether the schema dump/restore is time-consuming enough
> > > to warrant this optional more complex API, and whether people would
> > > support adding a server setting that prevented all system table changes?
> > 
> > I've certainly heard of cases where pg_upgrade takes significant amounts of
> > time to run on very complex databases.
> 
> Right, but that doesn't really answer the question as to which part of
> the pg_upgrade process is taking up the time.
> 
> In any case, of course, if we're able to move part of what pg_upgrade
> does to be while the old server is online then that takes whatever the
> cost of that is out of the downtime window.  The question is if that's a
> 5% improvement in the overall performance of pg_upgrade or a 70% one.
> This will be case-by-case, of course, but if, in the best-case, we only
> get a 5% improvement then this might not be worth the risk.

Yes, and who is going to know if they have a setup where the more
complex API is worth it?  pg_upgrade is already complex enough to use.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Dec  5, 2017 at 09:16:02AM -0500, Stephen Frost wrote:
> > > There are clusters that take a long time to dump the schema from the old
> > > cluster and recreate it in the new cluster.  One idea of speeding up
> > > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> > >
> > > 1.  prevent system catalog changes while the old cluster is running, and
> > > dump the old cluster's schema and restore it in the new cluster
> > >
> > > 2.  shut down the old cluster and copy/link the data files
> >
> > Perhaps a bit more complicated, but couldn't we copy/link while the
> > old cluster is online and in backup mode, finish backup mode, shut down
> > the old cluster, and then play forward the WAL to catch any relation
> > extents being added or similar, and then flip to the new PG version?
>
> Well, that would require reading the old WAL, which would add an
> additional compibility requirement that seems unwise.

In my proposal, this would be the old version of PG reading the old WAL.

Thinking about it a bit further though, I'm not sure it'd end up working
in link mode anyway, due to post-backup-finish changes that could be
made by the old server on the data files before it's shut down.

We have to have a way of dealing with the delta between the hard link
trees after the old server is shut down though because there could be
new relation extents, at least.

> > > My question is whether the schema dump/restore is time-consuming enough
> > > to warrant this optional more complex API, and whether people would
> > > support adding a server setting that prevented all system table changes?
> >
> > When you say 'system table changes', you're referring to basically all
> > DDL, right?  Just wish to clarify as there might be some confusion
> > between the terminology you're using here and allow_system_table_mods.
>
> Not only all DDL, but even updating them for the internal stuff, like
> pg_class.relfrozenxid.

Good point.  We'd really need a pretty bullet-proof way to ensure that
the catalog isn't changed during this time period and that seems like it
might be difficult without a lot of work.

> > Would we need to have autovacuum shut down too..?
>
> Yes.

Ok, makes sense.

> > The other concern is if there's changes made to the catalogs by non-DDL
> > activity that needs to be addressed too (logical replication?); nothing
> > definite springs to mind off-hand for me, but perhaps others will think
> > of things.
>
> Yes, it could extend to many parts of the system, which is why I am
> asking if it is worth it.

My initial reaction is that it's worth it, but then I also wonder about
other issues (having to get an ANALYZE done on the new cluster before
opening it up, for example..) and it makes me wonder if perhaps it'll
end up being too much risk for too little gain.

Thanks!

Stephen

Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> > In any case, of course, if we're able to move part of what pg_upgrade
> > does to be while the old server is online then that takes whatever the
> > cost of that is out of the downtime window.  The question is if that's a
> > 5% improvement in the overall performance of pg_upgrade or a 70% one.
> > This will be case-by-case, of course, but if, in the best-case, we only
> > get a 5% improvement then this might not be worth the risk.
>
> Yes, and who is going to know if they have a setup where the more
> complex API is worth it?  pg_upgrade is already complex enough to use.

Sure, but the solution there is really to make pg_upgrade simpler to
use, even as we add these more complicated APIs to it.  What that likely
means in practical terms is that we have another utility, which uses
pg_upgrade underneath, that you're able to configure to know about your
existing cluster and the version of PG you want to upgrade to and where
you want it and if you want a copy or if hard-links are ok, etc.

Having such a tool is actually what I'd been hoping would come out of
the documented process for doing a "pg_upgrade" on replicas that's
currently in our documentation.  That's not happened yet, but it's
something that David Steele and I have been chatting about because the
procedure in the documentation is terribly difficult and dangerous for
those who aren't as familiar with the system.

Perhaps we could have one tool that handles both the more complicated
pg_upgrade API and deals with upgrading replicas.  Alternatively, we
could have a config file for pg_upgrade instead which might be a simpler
way for people to describe exactly their current configuration and what
they'd like to go to.  Upgrading replicas involves using something like
SSH though..

Thanks!

Stephen

Re: Speeding up pg_upgrade

From
Peter Eisentraut
Date:
On 12/5/17 09:23, Stephen Frost wrote:
> Right, but that doesn't really answer the question as to which part of
> the pg_upgrade process is taking up the time.

Yeah, that should be measured before we do anything.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Tue, Dec  5, 2017 at 09:30:53AM -0500, Stephen Frost wrote:
> > > The other concern is if there's changes made to the catalogs by non-DDL
> > > activity that needs to be addressed too (logical replication?); nothing
> > > definite springs to mind off-hand for me, but perhaps others will think
> > > of things.
> > 
> > Yes, it could extend to many parts of the system, which is why I am
> > asking if it is worth it.
> 
> My initial reaction is that it's worth it, but then I also wonder about
> other issues (having to get an ANALYZE done on the new cluster before
> opening it up, for example..) and it makes me wonder if perhaps it'll
> end up being too much risk for too little gain.

Yes, dump/reload of analyze statistics seems like a better use of time. 
I have avoided it since it locks us into supporting the text
respresentation of data type, but at this point it might be worth it.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Alexander Kukushkin
Date:
Hi,


Yes, dump/reload of analyze statistics seems like a better use of time.
I have avoided it since it locks us into supporting the text
respresentation of data type, but at this point it might be worth it.


Couple of months ago we at Zalando upgraded a few databases of different sizes to 9.6.
During preparations to the I've found 2.5 pain-points:

1. We are using schema-based api deployment. Basically ~every week we create a new schema in the database and hundreds of stored procedures in it.
    Off course we remove old API schemas and trying not to keep more than last 10. Before the upgrade we basically dropped all API schemas except the one used in production.
    And even in this case dump-restore phase was taking much more time than relinking of datafiles.
    Unfortunately I don't have any numbers right now, but usually run of pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was spend in dump-restore.

2 ANALYZE phase is a pain. I think everybody agrees with it. 

2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes reasonable, except one case: some of the columns might have non default statistics target.
    It breaks `vacuumdb --analyze-in-stages`, because those specific columns it will not use value of default_statistics_target provided by vacuumdb.
    What I did - reset those non default values right before running pg_upgrade and restored them only when analyze was completed. Off course after that I've re-analyze those columns.



Regards,
Alexander Kukushkin

Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Alexander,

* Alexander Kukushkin (cyberdemn@gmail.com) wrote:
> Couple of months ago we at Zalando upgraded a few databases of different
> sizes to 9.6.

Thanks for sharing your experience!

> During preparations to the I've found 2.5 pain-points:
>
> 1. We are using schema-based api deployment. Basically ~every week we
> create a new schema in the database and hundreds of stored procedures in it.
>     Off course we remove old API schemas and trying not to keep more than
> last 10. Before the upgrade we basically dropped all API schemas except the
> one used in production.
>     And even in this case dump-restore phase was taking much more time than
> relinking of datafiles.
>     Unfortunately I don't have any numbers right now, but usually run of
> pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
> spend in dump-restore.

Ok, so eliminating 2/3 of the time would mean bringing it down to more
like 10 seconds.  That certainly seems worthwhile to me.  With the
linking time being much less than the dump/restore, we could at least
consider moving forward with Bruce's original idea where we do the
dump/restore while the system is online but then the linking with it
offline and get a serious performance boost out of it.  That also avoids
the issue with new files showing up while the system is running that I
brought up when we were talking about having the linking done with the
system online.

> 2 ANALYZE phase is a pain. I think everybody agrees with it.
>
> 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> reasonable, except one case: some of the columns might have non default
> statistics target.

Ok, if the stage-1 is very fast and performance is reasonable enough
after that then perhaps it's not so bad to keep it as-is for now and
focus on the dump/restore time.  That said, we should certainly also
work on improving this too.

>     It breaks `vacuumdb --analyze-in-stages`, because those specific
> columns it will not use value of default_statistics_target provided by
> vacuumdb.
>     What I did - reset those non default values right before running
> pg_upgrade and restored them only when analyze was completed. Off course
> after that I've re-analyze those columns.

Ah, yeah, ouch, that's unfortuante..  I wonder if there's something we
could do there to fix it..

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Alvaro Herrera
Date:
Stephen Frost wrote:
> Alexander,

> * Alexander Kukushkin (cyberdemn@gmail.com) wrote:

> > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > 
> > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > reasonable, except one case: some of the columns might have non default
> > statistics target.
> 
> Ok, if the stage-1 is very fast and performance is reasonable enough
> after that then perhaps it's not so bad to keep it as-is for now and
> focus on the dump/restore time.  That said, we should certainly also
> work on improving this too.

It seems pretty clear to me that we should somehow transfer stats from
the old server to the new one.  Shouldn't it just be a matter of
serializing the MCV/histogram/ndistinct values, then have capabilities
to load on the new server?  I suppose it'd just be used during binary
upgrade, but the point seems painful enough for a lot of users.
Obviously it would not be the raw contents of pg_statistic{,_ext}, but
rather something a bit higher-level.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Alvaro,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> Stephen Frost wrote:
> > * Alexander Kukushkin (cyberdemn@gmail.com) wrote:
>
> > > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > >
> > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > > reasonable, except one case: some of the columns might have non default
> > > statistics target.
> >
> > Ok, if the stage-1 is very fast and performance is reasonable enough
> > after that then perhaps it's not so bad to keep it as-is for now and
> > focus on the dump/restore time.  That said, we should certainly also
> > work on improving this too.
>
> It seems pretty clear to me that we should somehow transfer stats from
> the old server to the new one.  Shouldn't it just be a matter of
> serializing the MCV/histogram/ndistinct values, then have capabilities
> to load on the new server?  I suppose it'd just be used during binary
> upgrade, but the point seems painful enough for a lot of users.
> Obviously it would not be the raw contents of pg_statistic{,_ext}, but
> rather something a bit higher-level.

Right, I think that's what Bruce was getting at and certainly makes
sense to me as well.  I agree that it's a definite pain point for
people.  One complication is going to be custom data types, of course..

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Justin Pryzby
Date:
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
> 
> There are clusters that take a long time to dump the schema from the old
> cluster

Maybe it isn't representative of a typical case, but I can offer a data point:

For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression).  We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them.  Some of the tables are wide (historically up to 1600 columns).  Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).  Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's not
the first time we've seen an issue with larger number of children*columns.

Our slowest pg-upgrade was ~40min, caused by column defaults in a case where I
failed to re-DROP DEFAULTs after our first scheduled upgrade date was pushed
back by over a month.  Most of the rest were completed in less than 15min.

Justin


Re: Speeding up pg_upgrade

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> It seems pretty clear to me that we should somehow transfer stats from
> the old server to the new one.  Shouldn't it just be a matter of
> serializing the MCV/histogram/ndistinct values, then have capabilities
> to load on the new server?

The reason pg_upgrade hasn't done that in the past is not wishing to
assume that the new version does stats identically to the old version.
Since we do in fact add stats or change stuff around from time to time,
that's not a negligible consideration.

            regards, tom lane


Re: Speeding up pg_upgrade

From
Robert Haas
Date:
On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> It seems pretty clear to me that we should somehow transfer stats from
>> the old server to the new one.  Shouldn't it just be a matter of
>> serializing the MCV/histogram/ndistinct values, then have capabilities
>> to load on the new server?
>
> The reason pg_upgrade hasn't done that in the past is not wishing to
> assume that the new version does stats identically to the old version.
> Since we do in fact add stats or change stuff around from time to time,
> that's not a negligible consideration.

Yes, but we don't do that for every release.  We could put rules into
pg_upgrade about which releases changed the stats format incompatibly,
and not transfer the stats when crossing between two releases with
incompatible formats.  That's more than zero effort, of course, but it
might be worth it.  We've already got CATALOG_VERSION_NO,
XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.
Now, I have to admit that I find the process of trying to remember to
bump the correct set of version numbers in every commit just a tad
frustrating; it adds a cognitive burden I'd just as well skip.
However, the failure to transfer stats over the years seems to have
actually caused real problems for many users, so I think in this case
we might be best off sucking it up and adding one more version number.

We might even want to make it a little more fine-grained and track it
separately by data type, but I'm not sure if that's really worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Robert, all,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> >> It seems pretty clear to me that we should somehow transfer stats from
> >> the old server to the new one.  Shouldn't it just be a matter of
> >> serializing the MCV/histogram/ndistinct values, then have capabilities
> >> to load on the new server?
> >
> > The reason pg_upgrade hasn't done that in the past is not wishing to
> > assume that the new version does stats identically to the old version.
> > Since we do in fact add stats or change stuff around from time to time,
> > that's not a negligible consideration.
>
> Yes, but we don't do that for every release.  We could put rules into
> pg_upgrade about which releases changed the stats format incompatibly,
> and not transfer the stats when crossing between two releases with
> incompatible formats.  That's more than zero effort, of course, but it
> might be worth it.  We've already got CATALOG_VERSION_NO,
> XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
> BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
> GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
> PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
> K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.

If we go down that route, since this makes a pretty serious difference
in terms of what the user has to deal with post-pg_upgrade, I'd suggest
we require an additional option for the user to pass when stats aren't
going to be migrated, so they are aware of that.

The concern I have hear is that we end up changing things in v13 and
suddenly everyone has to re-analyze but they didn't to go from 10->11 or
11->12 and they'll get caught off-guard by it.

Of course, this might end up having an entirely different effect: it
might mean that we're suddenly a lot shier about changing the stats in a
backwards-incompatible way, just as we now are basically stuck with the
existing on-disk heap format..

> Now, I have to admit that I find the process of trying to remember to
> bump the correct set of version numbers in every commit just a tad
> frustrating; it adds a cognitive burden I'd just as well skip.

Agreed, would be great if we could improve on this.

> However, the failure to transfer stats over the years seems to have
> actually caused real problems for many users, so I think in this case
> we might be best off sucking it up and adding one more version number.

Yes, it's definitely been an issue for users.

> We might even want to make it a little more fine-grained and track it
> separately by data type, but I'm not sure if that's really worth it.

This would have the nice property that we could just re-analyze the data
types where things changed, something that's more likely to happen with
new data types than existing ones, I'd guess, and so that might be much
more reasonable for users.

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > It seems pretty clear to me that we should somehow transfer stats from
> > the old server to the new one.  Shouldn't it just be a matter of
> > serializing the MCV/histogram/ndistinct values, then have capabilities
> > to load on the new server?
> 
> The reason pg_upgrade hasn't done that in the past is not wishing to
> assume that the new version does stats identically to the old version.
> Since we do in fact add stats or change stuff around from time to time,
> that's not a negligible consideration.

Sure, but the new version can probably limp along with incomplete stats
until the next natural ANALYZE runs -- the system is operational in much
shorter time than if you have to make it all wait for the post-upgrade
full-database ANALYZE run.  The serialization step is so that the
underlying representation doesn't have to remain identical -- surely the
new server would be able to represent whatever the old server was able
to, regardless of any improvement made.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Speeding up pg_upgrade

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> The reason pg_upgrade hasn't done that in the past is not wishing to
>> assume that the new version does stats identically to the old version.
>> Since we do in fact add stats or change stuff around from time to time,
>> that's not a negligible consideration.

> Sure, but the new version can probably limp along with incomplete stats
> until the next natural ANALYZE runs -- the system is operational in much
> shorter time than if you have to make it all wait for the post-upgrade
> full-database ANALYZE run.  The serialization step is so that the
> underlying representation doesn't have to remain identical -- surely the
> new server would be able to represent whatever the old server was able
> to, regardless of any improvement made.

Well, this is assuming a lot of stuff not in evidence about how the
"serialization format" is designed and how we insert the data in the
new installation.  But if you think you can come up with something
that can handle such cases, go for it.

(In the spirit of full disclosure, I actually wrote code that allowed
dumping and reloading stats while I was at Salesforce.  But I've forgotten
the details of that design, and anyway I'm pretty sure it didn't handle
any cross-version scenarios, else I probably would have offered it to
the community.)

            regards, tom lane


Re: Speeding up pg_upgrade

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> If we go down that route, since this makes a pretty serious difference
> in terms of what the user has to deal with post-pg_upgrade, I'd suggest
> we require an additional option for the user to pass when stats aren't
> going to be migrated, so they are aware of that.

-1 ... you are forgetting that a lot of systems wrap pg_upgrade in some
sort of vendor-supplied upgrade script.  Nanny switches don't help;
the vendors will just start passing them automatically.

> Of course, this might end up having an entirely different effect: it
> might mean that we're suddenly a lot shier about changing the stats in a
> backwards-incompatible way, just as we now are basically stuck with the
> existing on-disk heap format..

Yeah, there's that.  But the rate of change in pg_statistic hasn't been
*that* large.  Alvaro might be right that we can design some transmission
procedure that allows stats to be forward-migrated when compatible and
dropped when not.

            regards, tom lane


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > If we go down that route, since this makes a pretty serious difference
> > in terms of what the user has to deal with post-pg_upgrade, I'd suggest
> > we require an additional option for the user to pass when stats aren't
> > going to be migrated, so they are aware of that.
>
> -1 ... you are forgetting that a lot of systems wrap pg_upgrade in some
> sort of vendor-supplied upgrade script.  Nanny switches don't help;
> the vendors will just start passing them automatically.

That really depends on the packagers.

> > Of course, this might end up having an entirely different effect: it
> > might mean that we're suddenly a lot shier about changing the stats in a
> > backwards-incompatible way, just as we now are basically stuck with the
> > existing on-disk heap format..
>
> Yeah, there's that.  But the rate of change in pg_statistic hasn't been
> *that* large.  Alvaro might be right that we can design some transmission
> procedure that allows stats to be forward-migrated when compatible and
> dropped when not.

Well, if it's dropped, I think we need to make sure that users are aware
of that going in and that's why I was suggesting a switch.  If you've
got a better idea for that, great, but having certain pg_upgrade
migrations require running ANALYZE and some migrations not require it is
something we need to make users *very* clear about.  No, I don't think a
note in the release notes is really enough..

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Yeah, there's that.  But the rate of change in pg_statistic hasn't been
>> *that* large.  Alvaro might be right that we can design some transmission
>> procedure that allows stats to be forward-migrated when compatible and
>> dropped when not.

> Well, if it's dropped, I think we need to make sure that users are aware
> of that going in and that's why I was suggesting a switch.  If you've
> got a better idea for that, great, but having certain pg_upgrade
> migrations require running ANALYZE and some migrations not require it is
> something we need to make users *very* clear about.  No, I don't think a
> note in the release notes is really enough..

Seems like we could make this reasonably transparent if pg_upgrade
continues to emit an analyze script that you're supposed to run
afterwards.  It just has to vary how much that script does.

            regards, tom lane


Re: Speeding up pg_upgrade

From
"David G. Johnston"
Date:
On Thu, Dec 7, 2017 at 12:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
If you've
got a better idea for that, great, but having certain pg_upgrade
migrations require running ANALYZE and some migrations not require it is
something we need to make users *very* clear about.  No, I don't think a
note in the release notes is really enough..

​Ideally this would be covered by:

"""
13. Post-Upgrade processing

​If any post-upgrade processing is required, pg_upgrade will issue warnings as it completes. It will also generate script files that must be run by the administrator. The script files will connect to each database that needs post-upgrade processing. Each script should be run using: ...
"""


Then whether is instructed as per #14 would be conditional.

Its arguable whether a warning/notice that "ANALYZE doesn't need to be run" message would be useful.

David J.

Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Tom, David,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Well, if it's dropped, I think we need to make sure that users are aware
> > of that going in and that's why I was suggesting a switch.  If you've
> > got a better idea for that, great, but having certain pg_upgrade
> > migrations require running ANALYZE and some migrations not require it is
> > something we need to make users *very* clear about.  No, I don't think a
> > note in the release notes is really enough..
>
> Seems like we could make this reasonably transparent if pg_upgrade
> continues to emit an analyze script that you're supposed to run
> afterwards.  It just has to vary how much that script does.

That's post-upgrade though, meaning it's entirely likely to catch
someone by surprise.  That's the issue that I have with it.

That said, I'm remembering that there's a '--check' option to
pg_upgrade.  Let's make sure to include in that --check output what
additional post-upgrade work will have to happen, that'll work well
enough to deal with this issue and I have little sympathy for people who
run pg_upgrade blindly and without running --check first, given that
there's an option for it.

* David G. Johnston (david.g.johnston@gmail.com) wrote:
> On Thu, Dec 7, 2017 at 12:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > If you've
> > got a better idea for that, great, but having certain pg_upgrade
> > migrations require running ANALYZE and some migrations not require it is
> > something we need to make users *very* clear about.  No, I don't think a
> > note in the release notes is really enough..
>
> Ideally this would be covered by:
>
> """
> 13. Post-Upgrade processing
>
> If any post-upgrade processing is required, pg_upgrade will issue warnings
> as it completes. It will also generate script files that must be run by the
> administrator. The script files will connect to each database that needs
> post-upgrade processing. Each script should be run using: ...
> """
>
> https://www.postgresql.org/docs/10/static/pgupgrade.html
>
> Then whether is instructed as per #14 would be conditional.
>
> Its arguable whether a warning/notice that "ANALYZE doesn't need to be run"
> message would be useful.

I'm fine with updating the docs to make it conditional but our users
need an easy way to figure out if their particular from/to PG versions
require ANALYZE to be run or not, so they're able to prepare for it.
Having that be included in the --check output would work to handle that
nicely.

Also, really, we should update those steps to have --check run earlier,
imv, and before saying "stop the clusters".

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Thu, Dec  7, 2017 at 10:37:30AM -0500, Stephen Frost wrote:
> Alexander,
> 
> * Alexander Kukushkin (cyberdemn@gmail.com) wrote:
> > Couple of months ago we at Zalando upgraded a few databases of different
> > sizes to 9.6.
> 
> Thanks for sharing your experience!
> 
> > During preparations to the I've found 2.5 pain-points:
> > 
> > 1. We are using schema-based api deployment. Basically ~every week we
> > create a new schema in the database and hundreds of stored procedures in it.
> >     Off course we remove old API schemas and trying not to keep more than
> > last 10. Before the upgrade we basically dropped all API schemas except the
> > one used in production.
> >     And even in this case dump-restore phase was taking much more time than
> > relinking of datafiles.
> >     Unfortunately I don't have any numbers right now, but usually run of
> > pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
> > spend in dump-restore.
> 
> Ok, so eliminating 2/3 of the time would mean bringing it down to more
> like 10 seconds.  That certainly seems worthwhile to me.  With the
> linking time being much less than the dump/restore, we could at least
> consider moving forward with Bruce's original idea where we do the
> dump/restore while the system is online but then the linking with it
> offline and get a serious performance boost out of it.  That also avoids
> the issue with new files showing up while the system is running that I
> brought up when we were talking about having the linking done with the
> system online.
> 
> > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > 
> > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > reasonable, except one case: some of the columns might have non default
> > statistics target.
> 
> Ok, if the stage-1 is very fast and performance is reasonable enough
> after that then perhaps it's not so bad to keep it as-is for now and
> focus on the dump/restore time.  That said, we should certainly also
> work on improving this too.

I think the big problem with two-stage pg_upgrade is that the user steps
are more complex, so what percentage of users are going use the
two-stage method.  The bad news is that only a small percentage of users
who will benefit from it will use it, and some who will not benefit it
will use it.  Also, this is going to require significant server changes,
which have to be maintained.

I think we need some statistics on how many users are going to benefit
from this, and how are users suppose to figure out if they will benefit
from it?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Mark Dilger
Date:
> On Dec 7, 2017, at 10:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Dec  7, 2017 at 10:37:30AM -0500, Stephen Frost wrote:
>> Alexander,
>>
>> * Alexander Kukushkin (cyberdemn@gmail.com) wrote:
>>> Couple of months ago we at Zalando upgraded a few databases of different
>>> sizes to 9.6.
>>
>> Thanks for sharing your experience!
>>
>>> During preparations to the I've found 2.5 pain-points:
>>>
>>> 1. We are using schema-based api deployment. Basically ~every week we
>>> create a new schema in the database and hundreds of stored procedures in it.
>>>    Off course we remove old API schemas and trying not to keep more than
>>> last 10. Before the upgrade we basically dropped all API schemas except the
>>> one used in production.
>>>    And even in this case dump-restore phase was taking much more time than
>>> relinking of datafiles.
>>>    Unfortunately I don't have any numbers right now, but usually run of
>>> pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
>>> spend in dump-restore.
>>
>> Ok, so eliminating 2/3 of the time would mean bringing it down to more
>> like 10 seconds.  That certainly seems worthwhile to me.  With the
>> linking time being much less than the dump/restore, we could at least
>> consider moving forward with Bruce's original idea where we do the
>> dump/restore while the system is online but then the linking with it
>> offline and get a serious performance boost out of it.  That also avoids
>> the issue with new files showing up while the system is running that I
>> brought up when we were talking about having the linking done with the
>> system online.
>>
>>> 2 ANALYZE phase is a pain. I think everybody agrees with it.
>>>
>>> 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
>>> reasonable, except one case: some of the columns might have non default
>>> statistics target.
>>
>> Ok, if the stage-1 is very fast and performance is reasonable enough
>> after that then perhaps it's not so bad to keep it as-is for now and
>> focus on the dump/restore time.  That said, we should certainly also
>> work on improving this too.
>
> I think the big problem with two-stage pg_upgrade is that the user steps
> are more complex, so what percentage of users are going use the
> two-stage method.  The bad news is that only a small percentage of users
> who will benefit from it will use it, and some who will not benefit it
> will use it.  Also, this is going to require significant server changes,
> which have to be maintained.

In my fork of the project, back when I was tracking 9.5, I added an option
to vacuum/analyze to make it behave a bit more like autovac, so that I could
run

    ANALYZE CONDITIONALLY;

and it would only analyze those tables in the system which autovac would
analyze.  In the grammar, CONDITIONALLY gets translated into a
VacuumOption flag.  In vacuum (in src/backend/commands/vacuum.c), inside
the "Loop to process each selected relation", if this flag is set, it checks the
PgStat_StatTabEntry for the table to determine whether to vacuum or analyze
the table.

I think this extension would be helpful in the context of the current conversation.
In those cases where pg_upgrade was able to migrate the statistics to the
new database, as long as it set the PgStat_StatTabEntry for each table where
statistics were migrated, then the user would just have to execute a
"VACUUM CONDITIONALLY" after upgrade, and the database would either
do a lot of analyze work, a little analyze work, or no analyze work depending
on which tables needed analyzing.

The main advantage here is that the user would always run this command
after pg_upgrade, without having to think about whether pg_upgrade had
migrated statistics or not.

If the community thinks this is useful, I could put together a patch.

mark

Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Mark,

* Mark Dilger (hornschnorter@gmail.com) wrote:
> > On Dec 7, 2017, at 10:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > I think the big problem with two-stage pg_upgrade is that the user steps
> > are more complex, so what percentage of users are going use the
> > two-stage method.  The bad news is that only a small percentage of users
> > who will benefit from it will use it, and some who will not benefit it
> > will use it.  Also, this is going to require significant server changes,
> > which have to be maintained.
>
> In my fork of the project, back when I was tracking 9.5, I added an option
> to vacuum/analyze to make it behave a bit more like autovac, so that I could
> run
>
>     ANALYZE CONDITIONALLY;
>
> and it would only analyze those tables in the system which autovac would
> analyze.  In the grammar, CONDITIONALLY gets translated into a
> VacuumOption flag.  In vacuum (in src/backend/commands/vacuum.c), inside
> the "Loop to process each selected relation", if this flag is set, it checks the
> PgStat_StatTabEntry for the table to determine whether to vacuum or analyze
> the table.
>
> I think this extension would be helpful in the context of the current conversation.
> In those cases where pg_upgrade was able to migrate the statistics to the
> new database, as long as it set the PgStat_StatTabEntry for each table where
> statistics were migrated, then the user would just have to execute a
> "VACUUM CONDITIONALLY" after upgrade, and the database would either
> do a lot of analyze work, a little analyze work, or no analyze work depending
> on which tables needed analyzing.
>
> The main advantage here is that the user would always run this command
> after pg_upgrade, without having to think about whether pg_upgrade had
> migrated statistics or not.
>
> If the community thinks this is useful, I could put together a patch.

This certainly sounds nice though I have to admit to being a bit
skeptical on the keyword selection, but perhaps blue really is the right
color for that bike shed.

One thing I'd wonder about is if that makes 'CONDITIONALLY' into a
reserved keyword, which wouldn't be ideal.  Perhaps a bit of a stretch
but 'ANALYZE ALL NEEDED' might avoid that?

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Mark Dilger
Date:
> On Dec 8, 2017, at 9:21 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Mark,
>
> * Mark Dilger (hornschnorter@gmail.com) wrote:
>>> On Dec 7, 2017, at 10:24 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> I think the big problem with two-stage pg_upgrade is that the user steps
>>> are more complex, so what percentage of users are going use the
>>> two-stage method.  The bad news is that only a small percentage of users
>>> who will benefit from it will use it, and some who will not benefit it
>>> will use it.  Also, this is going to require significant server changes,
>>> which have to be maintained.
>>
>> In my fork of the project, back when I was tracking 9.5, I added an option
>> to vacuum/analyze to make it behave a bit more like autovac, so that I could
>> run
>>
>>     ANALYZE CONDITIONALLY;
>>
>> and it would only analyze those tables in the system which autovac would
>> analyze.  In the grammar, CONDITIONALLY gets translated into a
>> VacuumOption flag.  In vacuum (in src/backend/commands/vacuum.c), inside
>> the "Loop to process each selected relation", if this flag is set, it checks the
>> PgStat_StatTabEntry for the table to determine whether to vacuum or analyze
>> the table.
>>
>> I think this extension would be helpful in the context of the current conversation.
>> In those cases where pg_upgrade was able to migrate the statistics to the
>> new database, as long as it set the PgStat_StatTabEntry for each table where
>> statistics were migrated, then the user would just have to execute a
>> "VACUUM CONDITIONALLY" after upgrade, and the database would either
>> do a lot of analyze work, a little analyze work, or no analyze work depending
>> on which tables needed analyzing.
>>
>> The main advantage here is that the user would always run this command
>> after pg_upgrade, without having to think about whether pg_upgrade had
>> migrated statistics or not.
>>
>> If the community thinks this is useful, I could put together a patch.
>
> This certainly sounds nice though I have to admit to being a bit
> skeptical on the keyword selection, but perhaps blue really is the right
> color for that bike shed.
>
> One thing I'd wonder about is if that makes 'CONDITIONALLY' into a
> reserved keyword, which wouldn't be ideal.  Perhaps a bit of a stretch
> but 'ANALYZE ALL NEEDED' might avoid that?

Yeah, I expected some complaint about CONDITIONALLY, and I don't have
any personal feelings about the choice of terms.  I'm happy to go with
your choice, or whatever the community decides.

mark



Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> I think the big problem with two-stage pg_upgrade is that the user steps
> are more complex, so what percentage of users are going use the
> two-stage method.  The bad news is that only a small percentage of users
> who will benefit from it will use it, and some who will not benefit it
> will use it.  Also, this is going to require significant server changes,
> which have to be maintained.

This is where I think we need to be considering a higher-level tool that
makes it easier to run pg_upgrade and which could handle these multiple
stages.

> I think we need some statistics on how many users are going to benefit
> from this, and how are users suppose to figure out if they will benefit
> from it?

If the complexity issue is addressed, then wouldn't all users who use
pg_upgrade in link mode benefit from this..?  Or are you thinking we
need to figure out which users really need to have pg_upgrade be faster
from those who don't?  The former would be 100% and the latter seems
extremely difficult to determine and not all that useful in the end- not
every user needs SELECT to be faster, but we sure want to make it faster
if we can do so reasonably.

I agree that we need to really consider if the additional complexity is
worth the performance improvement, but I don't think we really have a
gauge as to the complexity level or the ongoing maintenance effort
required, and without that we can't really say if it's too much or not.

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Fri, Dec  8, 2017 at 12:26:55PM -0500, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > I think the big problem with two-stage pg_upgrade is that the user steps
> > are more complex, so what percentage of users are going use the
> > two-stage method.  The bad news is that only a small percentage of users
> > who will benefit from it will use it, and some who will not benefit it
> > will use it.  Also, this is going to require significant server changes,
> > which have to be maintained.
> 
> This is where I think we need to be considering a higher-level tool that
> makes it easier to run pg_upgrade and which could handle these multiple
> stages.
> 
> > I think we need some statistics on how many users are going to benefit
> > from this, and how are users suppose to figure out if they will benefit
> > from it?
> 
> If the complexity issue is addressed, then wouldn't all users who use
> pg_upgrade in link mode benefit from this..?  Or are you thinking we

The instructions in the docs are going to be more complex.  We don't
have any planned way to make the two-stage approach the same complexity
as the single-stage approach.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Stephen Frost
Date:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Dec  8, 2017 at 12:26:55PM -0500, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > I think the big problem with two-stage pg_upgrade is that the user steps
> > > are more complex, so what percentage of users are going use the
> > > two-stage method.  The bad news is that only a small percentage of users
> > > who will benefit from it will use it, and some who will not benefit it
> > > will use it.  Also, this is going to require significant server changes,
> > > which have to be maintained.
> >
> > This is where I think we need to be considering a higher-level tool that
> > makes it easier to run pg_upgrade and which could handle these multiple
> > stages.
> >
> > > I think we need some statistics on how many users are going to benefit
> > > from this, and how are users suppose to figure out if they will benefit
> > > from it?
> >
> > If the complexity issue is addressed, then wouldn't all users who use
> > pg_upgrade in link mode benefit from this..?  Or are you thinking we
>
> The instructions in the docs are going to be more complex.  We don't
> have any planned way to make the two-stage approach the same complexity
> as the single-stage approach.

A lot of the complexity in the current approach is all the different
steps that we take in the pg_upgrade process, which is largely driven by
the lack of a higher-level tool to handle all those steps.

The distributions have tried to address that by having their own tools
and scripts to simplify the process.  Debian-based systems have
pg_upgradecluster which is a single command that handles everything.
The RPM-based systems have the 'setup' script that's pretty similar.

If pg_upgrade had a two-stage process, those scripts would be updated to
take advantage of that, I expect, and users would be largely shielded
from the increase in complexity.

One of the things that those scripts are able to take advantage of are
configuration files which they have (or hard-coded knowledge) about
where everything is on the system, what clusters exist, etc.  That's
information that pg_upgrade itself doesn't have and therefore we can't
really automate in pg_upgrade.  One option would be to build support for
pg_upgrade to have a config file that has all of that information and
then make pg_upgrade (or another tool) able to manage the process.

Given that the major distributions already have that, I'm not entirely
sure that it's something we should be trying to duplicate since I don't
think we'd be able to do so in a way that integrates to the same level
that the distribution-specific tools do, but perhaps we should try, or
maybe work with the distributions to have a way for them to generate a
config file with the info we need..?  Just brain storming here.

Thanks!

Stephen

Attachment

Re: Speeding up pg_upgrade

From
Bruce Momjian
Date:
On Sat, Dec  9, 2017 at 08:45:14AM -0500, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Fri, Dec  8, 2017 at 12:26:55PM -0500, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > I think the big problem with two-stage pg_upgrade is that the user steps
> > > > are more complex, so what percentage of users are going use the
> > > > two-stage method.  The bad news is that only a small percentage of users
> > > > who will benefit from it will use it, and some who will not benefit it
> > > > will use it.  Also, this is going to require significant server changes,
> > > > which have to be maintained.
> > > 
> > > This is where I think we need to be considering a higher-level tool that
> > > makes it easier to run pg_upgrade and which could handle these multiple
> > > stages.
> > > 
> > > > I think we need some statistics on how many users are going to benefit
> > > > from this, and how are users suppose to figure out if they will benefit
> > > > from it?
> > > 
> > > If the complexity issue is addressed, then wouldn't all users who use
> > > pg_upgrade in link mode benefit from this..?  Or are you thinking we
> > 
> > The instructions in the docs are going to be more complex.  We don't
> > have any planned way to make the two-stage approach the same complexity
> > as the single-stage approach.
> 
> A lot of the complexity in the current approach is all the different
> steps that we take in the pg_upgrade process, which is largely driven by
> the lack of a higher-level tool to handle all those steps.

Sorry for the late reply.  Yes, that is _exactly_ the problem, stated
better than I have in the past.

> The distributions have tried to address that by having their own tools
> and scripts to simplify the process.  Debian-based systems have
> pg_upgradecluster which is a single command that handles everything.
> The RPM-based systems have the 'setup' script that's pretty similar.

Yes, that is because the distributions scripts _have_ to deal with these
higher-level details, so they are the logical place to automate
pg_upgrade.

> If pg_upgrade had a two-stage process, those scripts would be updated to
> take advantage of that, I expect, and users would be largely shielded
> from the increase in complexity.

Yes, that is certainly possible.

> One of the things that those scripts are able to take advantage of are
> configuration files which they have (or hard-coded knowledge) about
> where everything is on the system, what clusters exist, etc.  That's
> information that pg_upgrade itself doesn't have and therefore we can't

I think the big problem with distribution scripts automating pg_upgrade
is that each distribution is having to solve the problem on their own,
and pg_upgrade is complex enough that this is a significant burden.

> really automate in pg_upgrade.  One option would be to build support for
> pg_upgrade to have a config file that has all of that information and
> then make pg_upgrade (or another tool) able to manage the process.

Yes, that is an interesting idea.

> Given that the major distributions already have that, I'm not entirely
> sure that it's something we should be trying to duplicate since I don't
> think we'd be able to do so in a way that integrates to the same level
> that the distribution-specific tools do, but perhaps we should try, or
> maybe work with the distributions to have a way for them to generate a
> config file with the info we need..?  Just brain storming here.

Yes, thanks for the feedback.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Speeding up pg_upgrade

From
Jeff Janes
Date:
On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
>
> There are clusters that take a long time to dump the schema from the old
> cluster

Maybe it isn't representative of a typical case, but I can offer a data point:

For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression).  We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them.  Some of the tables are wide (historically up to 1600 columns).  Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).  Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's not
the first time we've seen an issue with larger number of children*columns.

This is probably worth fixing independent of other ways of speeding up pg_upgrade.

It spends most of its time making the column names unique while de-parsing the DEFAULT clause for each column.  But I don't think it ever outputs the column name which results from that deparsing, and since there is only one table involved, the names should already be unique anyway, unless I am missing something.

The time seems to be quadratic in number of columns if all columns have defaults, or proportional to the product of number of columns in table and the number of columns with defaults.

The CREATE TABLE has a similar problem upon restoring the dump.

Cheers,

Jeff
Attachment