Re: Speeding up pg_upgrade - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Speeding up pg_upgrade
Date
Msg-id CAMkU=1x-e+maqefhM1yMeSiJ8J9Z+SJHgW7c9bqo3E3JMG4iJA@mail.gmail.com
Whole thread Raw
In response to Re: Speeding up pg_upgrade  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
>
> There are clusters that take a long time to dump the schema from the old
> cluster

Maybe it isn't representative of a typical case, but I can offer a data point:

For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression).  We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them.  Some of the tables are wide (historically up to 1600 columns).  Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).  Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's not
the first time we've seen an issue with larger number of children*columns.

This is probably worth fixing independent of other ways of speeding up pg_upgrade.

It spends most of its time making the column names unique while de-parsing the DEFAULT clause for each column.  But I don't think it ever outputs the column name which results from that deparsing, and since there is only one table involved, the names should already be unique anyway, unless I am missing something.

The time seems to be quadratic in number of columns if all columns have defaults, or proportional to the product of number of columns in table and the number of columns with defaults.

The CREATE TABLE has a similar problem upon restoring the dump.

Cheers,

Jeff
Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: User defined data types in Logical Replication
Next
From: Robert Haas
Date:
Subject: Re: pgsql: pg_upgrade: simplify code layout in a few places