Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Further pg_upgrade analysis for many tables
Date
Msg-id 20121109032523.GB26605@momjian.us
Whole thread Raw
In response to Re: Further pg_upgrade analysis for many tables  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: Further pg_upgrade analysis for many tables  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Thu, Nov  8, 2012 at 12:30:11PM -0500, Peter Eisentraut wrote:
> On 11/7/12 9:17 PM, Bruce Momjian wrote:
> > As a followup to Magnus's report that pg_upgrade was slow for many
> > tables, I did some more testing with many tables, e.g.:
> > 
> >     CREATE TABLE test991 (x SERIAL);
> > 
> > I ran it for 0, 1k, 2k, ... 16k tables, and got these results:
> > 
> >     tables    pg_dump     restore     pg_upgrade(increase)
> >         0       0.30        0.24       11.73(-)
> >      1000       6.46        6.55       28.79(2.45x)
> >      2000      29.82       20.96       69.75(2.42x)
> >      4000      95.70      115.88      289.82(4.16x)
> >      8000     405.38      505.93     1168.60(4.03x)
> >     16000    1702.23     2197.56     5022.82(4.30x)
> 
> I can reproduce these numbers, more or less.  (Additionally, it ran out
> of shared memory with the default setting when dumping the 8000 tables.)
> 
> But this issue seems to be entirely the fault of sequences being
> present.  When I replace the serial column with an int, everything
> finishes within seconds and scales seemingly linearly.

I did some more research and realized that I was not using --schema-only
like pg_upgrade uses.  With that setting, things look like this:
--schema-onlytables    pg_dump     restore  pg_upgrade1           0.27        0.23    11.73(-)1000        3.64
5.18   28.79(2.45)2000       13.07       14.63    69.75(2.42)4000       43.93       66.87   289.82(4.16)8000
190.63     326.67  1168.60(4.03)16000     757.80     1402.82  5022.82(4.30)
 

You can still see the 4x increase, but it now for all tests ---
basically, every time the number of tables doubles, the time to dump or
restore a _single_ table doubles, e.g. for 1k tables, a single table
takes 0.00364 to dump, for 16k tables, a single table takes 0.04736 to
dump, a 13x slowdown.

Second, with --schema-only, you can see the dump/restore is only 50% of
the duration of pg_upgrade, and you can also see that pg_upgrade itself
is slowing down as the number of tables increases, even ignoring the
dump/reload time.

This is all bad news.  :-(  I will keep digging.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Enabling Checksums
Next
From: Jeff Janes
Date:
Subject: Re: Further pg_upgrade analysis for many tables