Re: Speeding up pg_upgrade - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Speeding up pg_upgrade
Date
Msg-id 20180103174939.GC28459@momjian.us
Whole thread Raw
In response to Re: Speeding up pg_upgrade  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Sat, Dec  9, 2017 at 08:45:14AM -0500, Stephen Frost wrote:
> Bruce,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Fri, Dec  8, 2017 at 12:26:55PM -0500, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > I think the big problem with two-stage pg_upgrade is that the user steps
> > > > are more complex, so what percentage of users are going use the
> > > > two-stage method.  The bad news is that only a small percentage of users
> > > > who will benefit from it will use it, and some who will not benefit it
> > > > will use it.  Also, this is going to require significant server changes,
> > > > which have to be maintained.
> > > 
> > > This is where I think we need to be considering a higher-level tool that
> > > makes it easier to run pg_upgrade and which could handle these multiple
> > > stages.
> > > 
> > > > I think we need some statistics on how many users are going to benefit
> > > > from this, and how are users suppose to figure out if they will benefit
> > > > from it?
> > > 
> > > If the complexity issue is addressed, then wouldn't all users who use
> > > pg_upgrade in link mode benefit from this..?  Or are you thinking we
> > 
> > The instructions in the docs are going to be more complex.  We don't
> > have any planned way to make the two-stage approach the same complexity
> > as the single-stage approach.
> 
> A lot of the complexity in the current approach is all the different
> steps that we take in the pg_upgrade process, which is largely driven by
> the lack of a higher-level tool to handle all those steps.

Sorry for the late reply.  Yes, that is _exactly_ the problem, stated
better than I have in the past.

> The distributions have tried to address that by having their own tools
> and scripts to simplify the process.  Debian-based systems have
> pg_upgradecluster which is a single command that handles everything.
> The RPM-based systems have the 'setup' script that's pretty similar.

Yes, that is because the distributions scripts _have_ to deal with these
higher-level details, so they are the logical place to automate
pg_upgrade.

> If pg_upgrade had a two-stage process, those scripts would be updated to
> take advantage of that, I expect, and users would be largely shielded
> from the increase in complexity.

Yes, that is certainly possible.

> One of the things that those scripts are able to take advantage of are
> configuration files which they have (or hard-coded knowledge) about
> where everything is on the system, what clusters exist, etc.  That's
> information that pg_upgrade itself doesn't have and therefore we can't

I think the big problem with distribution scripts automating pg_upgrade
is that each distribution is having to solve the problem on their own,
and pg_upgrade is complex enough that this is a significant burden.

> really automate in pg_upgrade.  One option would be to build support for
> pg_upgrade to have a config file that has all of that information and
> then make pg_upgrade (or another tool) able to manage the process.

Yes, that is an interesting idea.

> Given that the major distributions already have that, I'm not entirely
> sure that it's something we should be trying to duplicate since I don't
> think we'd be able to do so in a way that integrates to the same level
> that the distribution-specific tools do, but perhaps we should try, or
> maybe work with the distributions to have a way for them to generate a
> config file with the info we need..?  Just brain storming here.

Yes, thanks for the feedback.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Issues with logical replication
Next
From: Andrew Dunstan
Date:
Subject: to_timestamp TZH and TZM format specifiers