On Thu, Nov 8, 2012 at 12:30:11PM -0500, Peter Eisentraut wrote:
> On 11/7/12 9:17 PM, Bruce Momjian wrote:
> > As a followup to Magnus's report that pg_upgrade was slow for many
> > tables, I did some more testing with many tables, e.g.:
> >
> > CREATE TABLE test991 (x SERIAL);
> >
> > I ran it for 0, 1k, 2k, ... 16k tables, and got these results:
> >
> > tables pg_dump restore pg_upgrade(increase)
> > 0 0.30 0.24 11.73(-)
> > 1000 6.46 6.55 28.79(2.45x)
> > 2000 29.82 20.96 69.75(2.42x)
> > 4000 95.70 115.88 289.82(4.16x)
> > 8000 405.38 505.93 1168.60(4.03x)
> > 16000 1702.23 2197.56 5022.82(4.30x)
>
> I can reproduce these numbers, more or less. (Additionally, it ran out
> of shared memory with the default setting when dumping the 8000 tables.)
>
> But this issue seems to be entirely the fault of sequences being
> present. When I replace the serial column with an int, everything
> finishes within seconds and scales seemingly linearly.
I did some more research and realized that I was not using --schema-only
like pg_upgrade uses. With that setting, things look like this:
--schema-onlytables pg_dump restore pg_upgrade1 0.27 0.23 11.73(-)1000 3.64
5.18 28.79(2.45)2000 13.07 14.63 69.75(2.42)4000 43.93 66.87 289.82(4.16)8000
190.63 326.67 1168.60(4.03)16000 757.80 1402.82 5022.82(4.30)
You can still see the 4x increase, but it now for all tests ---
basically, every time the number of tables doubles, the time to dump or
restore a _single_ table doubles, e.g. for 1k tables, a single table
takes 0.00364 to dump, for 16k tables, a single table takes 0.04736 to
dump, a 13x slowdown.
Second, with --schema-only, you can see the dump/restore is only 50% of
the duration of pg_upgrade, and you can also see that pg_upgrade itself
is slowing down as the number of tables increases, even ignoring the
dump/reload time.
This is all bad news. :-( I will keep digging.
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
+ It's impossible for everything to be true. +