Thread: BUG #7884: pg_upgradecluster is terribly slow

BUG #7884: pg_upgradecluster is terribly slow

From
rtvd@me.com
Date:
The following bug has been logged on the website:

Bug reference:      7884
Logged by:          Denys Rtveliashvili
Email address:      rtvd@me.com
PostgreSQL version: 9.1.7
Operating system:   Ubuntu 12.04
Description:        =


I have tried upgrading a database from 8.4 to 9.1 and pg_upgradecluster is
terribly slow.

The bandwidth of data stream is about 10MB/sec (measured by iostat). This is
quite bad, as this runs on a quite new and powerful machine (new, with
proper battery-backed HW RAID, many disks).

IO is certainly not a bottleneck. The problem is that "psql" and "pg_dump"
are consuming a lot of CPU power while doing almost nothing.

When I do "pg_dump > /dev/null" I see read bandwidth about 10MB/sec. This is
tiny comparing to 500MB/sec shown by iozone3.

Also, I think there are obvious problems with pg_upgradecluster:

1. It uses "-Fc". This is bad as it enables compression by default. Which is
pointless and wastes CPU power.

2. It seems to migrate the schema first and only then it moves data. This is
bad as indices are being re-built when the data is finally poured in.

3. The migration does not happen within a transaction. Thus, there are two
consequences:

3.1. The migration will not fail obviously in case data cannot be moved
carefully for whatever reason. This is just not safe.

3.2. The migration does not work as fast as it could. "COPY" can run faster
in case it happens within the same transaction where table has been created.
However, as creation of the table and copying into it are separated, the
postgres has to do a lot of useless work which slows it down.

I was able to improve the performance of upgrade approximately 4 times by
following a different approach:

1. migrate the schema

2. drop all databases (leaving roles in)

3. re-creating the databases (without any content)

4. for each database run pg_dump <database> | psql --single-transaction
<database>

Perhaps the same thing should also be used in pg_upgradecluster?

Re: BUG #7884: pg_upgradecluster is terribly slow

From
Kevin Grittner
Date:
"rtvd@me.com" <rtvd@me.com> wrote:=0A=0A> I have tried upgrading a database=
 from 8.4 to 9.1 and=0A> pg_upgradecluster is terribly slow.=0A=0AI had nev=
er heard of pg_upgradecluster sluster; a google search=0Asuggests that ther=
e is an Ubuntu utility by that name, so you might=0Awant to register a comp=
laint there; this list is definitely not the=0Aright place for that.=0A=0AY=
ou might want to try the utility which *is* supported by the=0APostgreSQL c=
ommunity, which is pg_upgrade.=A0 For speed, consider=0Ausing the -k option=
.=A0 I've been able to upgrade a 3 TB database in=0Ajust a few minutes with=
 that.=A0 (Timing depends more on the number=0Aof database objects than the=
ir size.)=0A=0Ahttp://www.postgresql.org/docs/9.1/interactive/pgupgrade.htm=
l=0A=0A-- =0AKevin Grittner=0AEnterpriseDB: http://www.enterprisedb.com=0AT=
he Enterprise PostgreSQL Company

Re: BUG #7884: pg_upgradecluster is terribly slow

From
Martin Pitt
Date:
Hello,

rtvd@me.com [2013-02-15 17:41 +0000]:
> I have tried upgrading a database from 8.4 to 9.1 and pg_upgradecluster is
> terribly slow.

Please note that this is a Debian/Ubuntu specific bug,
pg_upgradecluster is not shipped by upstream PostgreSQL.

I admittedly don't have any first-hand experience with migrating large
databases, and get very little feedback about it, so thanks for your
observations!

First of all, Peter has worked on a branch that makes
pg_upgradecluster use pg_upgrade, which hopefully will be a lot more
performant.

> 1. It uses "-Fc". This is bad as it enables compression by default. Which is
> pointless and wastes CPU power.

The tar format (-Ft) has also been supported since at least 8.4
(that's the earliest version which is still supported in
Debian/Ubuntu), and documentation says it's not compressed. If you
replace -Fc with -Ft, do you see a significant performance increase?

> 2. It seems to migrate the schema first and only then it moves data. This is
> bad as indices are being re-built when the data is finally poured in.

At least in earlier versions pg_dumpall wasn't able to dump BLOBs, and
the manpage doesn't indicate otherwise for 9.2 either. AFAIK this just
supports the plain SQL text format. I don't believe pg_dumpall would
avoid the rebuilding of incides?

> 3. The migration does not happen within a transaction.

> 1. migrate the schema

pg_upgradecluster does that, too.

> 2. drop all databases (leaving roles in)
> 3. re-creating the databases (without any content)

Out of interest, why does that help?

> 4. for each database run pg_dump <database> | psql --single-transaction
> <database>

As written above, I don't want to use the text format and psql, but
pg_restore also supports --single-transaction since at least 8.4.

The question which I'm not sure about is whether it's ok to use
--single-transaction even for very large databases. I. e. is piling up
gigabytes of data in a transaction and committing it all in the end
always more efficient than the default mode (which I assume will use
one transaction by row)? Is that a safe thing to do, or could one run
into out-of-memory conditions?

http://people.canonical.com/~pitti/tmp/pg_upgradecluster is a version
with these two changes: pg_restore --single-transaction and using the
tar format. Perhaps you can try this, and compare performance?

Thanks,

Martin
--
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

Re: BUG #7884: pg_upgradecluster is terribly slow

From
John R Pierce
Date:
On 2/15/2013 10:27 AM, Kevin Grittner wrote:
> You might want to try the utility which*is*  supported by the
> PostgreSQL community, which is pg_upgrade.  For speed, consider
> using the -k option.  I've been able to upgrade a 3 TB database in
> just a few minutes with that.  (Timing depends more on the number
> of database objects than their size.)

do note this -k option only functions if the old and new cluster are on
the same mount point, so the files can be mv'd



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: BUG #7884: pg_upgradecluster is terribly slow

From
Bruce Momjian
Date:
On Fri, Feb 15, 2013 at 03:55:02PM -0800, John R Pierce wrote:
> On 2/15/2013 10:27 AM, Kevin Grittner wrote:
>
>     You might want to try the utility which *is* supported by the
>     PostgreSQL community, which is pg_upgrade.  For speed, consider
>     using the -k option.  I've been able to upgrade a 3 TB database in
>     just a few minutes with that.  (Timing depends more on the number
>     of database objects than their size.)
>
>
> do note this -k option only functions if the old and new cluster are on the
> same mount point, so the files can be mv'd

Actually, hard linked, not moved.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: BUG #7884: pg_upgradecluster is terribly slow

From
John R Pierce
Date:
On 2/15/2013 4:27 PM, Bruce Momjian wrote:
>> >
>> >do note this -k option only functions if the old and new cluster are on the
>> >same mount point, so the files can be mv'd
> Actually, hard linked, not moved.


errr, that.   right.   ;)



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast