Re: Further pg_upgrade analysis for many tables - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Further pg_upgrade analysis for many tables
Date
Msg-id 20121110171558.GB31383@momjian.us
Whole thread Raw
In response to Re: Further pg_upgrade analysis for many tables  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Further pg_upgrade analysis for many tables  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Fri, Nov  9, 2012 at 04:23:40PM -0800, Jeff Janes wrote:
> On Fri, Nov 9, 2012 at 3:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Thu, Nov  8, 2012 at 08:59:21PM -0800, Jeff Janes wrote:
> >> On Thu, Nov 8, 2012 at 4:33 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> >
> >> > I am actually now dumping git head/9.3, so I assume all the problems we
> >> > know about should be fixed.
> >>
> >> Are sure the server you are dumping out of is head?
> >
> > Well, I tested again with 9.2 dumping/loading 9.2 and the same for git
> > head, and got these results:
> >
> >                        pg_dump            restore
> >                    9.2       git       9.2       git
> >
> >             1      0.13      0.11      0.07      0.07
> >          1000      4.37      3.98      4.32      5.28
> >          2000     12.98     12.19     13.64     14.25
> >          4000     47.85     50.14     61.31     70.97
> >          8000    210.39    183.00    302.67    294.20
> >         16000    901.53    769.83   1399.25   1359.09
> 
> For pg_dump, there are 4 possible combinations, not just two.  you can
> use 9.2's pg_dump to dump from a 9.2 server, use git's pg_dump to dump
> from a 9.2 server, use git's pg_dump to dump from a git server, or use
> 9.2's pg_dump to dump from a git server (although that last one isn't
> very relevant)

True, but I thought doing matching versions was a sufficient test.

> >> Using head's pg_dump, but 9.2.1 server, it takes me 179.11 seconds to
> >> dump 16,000 tables (schema only) like your example, and it is
> >> definitely quadratic.
> >
> > Are you using a SERIAL column for the tables.  I am, and Peter
> > Eisentraut reported that was a big slowdown.
> 
> Yes, I'm using the same table definition as your example.

OK.

> >> But using head's pg_dump do dump tables out of head's server, it only
> >> took 24.95 seconds, and the quadratic term is not yet important,
> >> things still look linear.
> >
> > Again, using SERIAL?
> 
> Yep.

Odd why yours is so much after.

> >> they are stored in the file, than it is shame that pg_dump goes to the
> >> effort of extracting that info if pg_upgrade is just going to
> >> overwrite it anyway.
> >
> > Actually, pg_upgrade needs pg_dump to restore all those sequence values.
> 
> I did an experiment where I had pg_dump just output dummy values
> rather than hitting the database.  Once pg_upgrade moves the relation
> files over, the dummy values disappear and are set back to their
> originals.  So I think that pg_upgrade depends on pg_dump only in a
> trivial way--they need to be there, but it doesn't matter what they
> are.

Oh, wow, I had not thought of that.  Once we move the sequence files
into place from the old cluster, whatever was assigned to the sequence
counter by pg_dump restored is thrown away.  Good point.  

I am hesistant to add an optimization to pg_dump to fix this unless we
decide that pg_dump uses sequences in some non-optimal way that would
not warrant us improving general sequence creation performance.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Further pg_upgrade analysis for many tables
Next
From: Bruce Momjian
Date:
Subject: Re: Further pg_upgrade analysis for many tables