Thread: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
joe@tanga.com
Date:
The following bug has been logged on the website:

Bug reference:      11033
Logged by:          Joe Van Dyk
Email address:      joe@tanga.com
PostgreSQL version: 9.3.5
Operating system:   Ubuntu 12.04
Description:

I've got a small database that takes about 0.8 seconds to dump with
'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.

I've narrowed it down to the foreign key constraints in the database, if
those are removed, then 'pg_dump -a' becomes fast again.

I can't come up with an artificial test case. I can reproduce it using our
company's db schema. I don't want to publicly post that schema to the
mailing list, but I'd be happy to send the schema that shows the problem to
someone privately.

I can reproduce the problem in 9.3.4 and 9.4.beta1.

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Joe Van Dyk
Date:
This is what I'm seeing, as far as timing.

$ time pg_dump tanga_dev > /dev/null

real 0m1.182s
user 0m0.260s
sys 0m0.020s


$ time pg_dump -a tanga_dev > /dev/null
pg_dump: NOTICE: there are circular foreign-key constraints among these
table(s):
pg_dump:   products
pg_dump: You might not be able to restore the dump without using
--disable-triggers or temporarily dropping the constraints.
pg_dump: Consider using a full dump instead of a --data-only dump to avoid
this problem.
    <snip a few more of these>

real 0m11.390s
user 0m10.653s
sys 0m0.032s


On Thu, Jul 24, 2014 at 5:10 PM, <joe@tanga.com> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      11033
> Logged by:          Joe Van Dyk
> Email address:      joe@tanga.com
> PostgreSQL version: 9.3.5
> Operating system:   Ubuntu 12.04
> Description:
>
> I've got a small database that takes about 0.8 seconds to dump with
> 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.
>
> I've narrowed it down to the foreign key constraints in the database, if
> those are removed, then 'pg_dump -a' becomes fast again.
>
> I can't come up with an artificial test case. I can reproduce it using our
> company's db schema. I don't want to publicly post that schema to the
> mailing list, but I'd be happy to send the schema that shows the problem to
> someone privately.
>
> I can reproduce the problem in 9.3.4 and 9.4.beta1.
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Tom Lane
Date:
joe@tanga.com writes:
> I've got a small database that takes about 0.8 seconds to dump with
> 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.

> I've narrowed it down to the foreign key constraints in the database, if
> those are removed, then 'pg_dump -a' becomes fast again.

Given that and the warning messages shown in your followup, a plausible
guess is that pg_dump is wasting a lot of time vainly searching for ways
to break the circular dependency loops that arise from trying to order
the tables in a way that allows them to be restored with foreign key
constraints active.  However: (a) it seems like it'd take a heck of a lot
of FK constraints for that to become a dominant factor, and (b) if that
were the explanation, seems like it should not be that hard to make an
artificial test case.  So I'm thinking there's some other contributing
factor you haven't shown us.

> I can't come up with an artificial test case. I can reproduce it using our
> company's db schema. I don't want to publicly post that schema to the
> mailing list, but I'd be happy to send the schema that shows the problem to
> someone privately.

I'd be willing to take a look, but it's not clear that there is any easy
fix, and TBH I'm not sure that this case is worth spending a lot of time
on.  As the warning messages are telling you, a --data-only dump is of
limited value when you've got circular foreign key references.  What
are you intending to do with the dump anyway?

            regards, tom lane

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Joe Van Dyk
Date:
On Thu, Jul 24, 2014 at 6:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> joe@tanga.com writes:
> > I've got a small database that takes about 0.8 seconds to dump with
> > 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.
>
> > I've narrowed it down to the foreign key constraints in the database, if
> > those are removed, then 'pg_dump -a' becomes fast again.
>
> Given that and the warning messages shown in your followup, a plausible
> guess is that pg_dump is wasting a lot of time vainly searching for ways
> to break the circular dependency loops that arise from trying to order
> the tables in a way that allows them to be restored with foreign key
> constraints active.  However: (a) it seems like it'd take a heck of a lot
> of FK constraints for that to become a dominant factor, and (b) if that
> were the explanation, seems like it should not be that hard to make an
> artificial test case.  So I'm thinking there's some other contributing
> factor you haven't shown us.
>

I don't have an unreasonable amount of foreign key constraints, as far as I
can tell.

For an example, on one of my slower systems, adding two foreign key
constraints caused 'pg_dump -a' to get 5 seconds slower (goes from 15
seconds to 20) -- the constraints are on tables that have 12 and 5 rows.

I'll send you a test db shortly. Thanks for looking into it!


>
> > I can't come up with an artificial test case. I can reproduce it using
> our
> > company's db schema. I don't want to publicly post that schema to the
> > mailing list, but I'd be happy to send the schema that shows the problem
> to
> > someone privately.
>
> I'd be willing to take a look, but it's not clear that there is any easy
> fix, and TBH I'm not sure that this case is worth spending a lot of time
> on.  As the warning messages are telling you, a --data-only dump is of
> limited value when you've got circular foreign key references.  What
> are you intending to do with the dump anyway?
>

This is used for running automated tests that modify the database. Before
the tests are run, I use 'pg_dump -a' to generate the initial set of data
that the tests use. Then, before each test, I disable all the triggers,
truncate all the tables, load the data-only dump into the database, then
re-enable the triggers. As far as I can tell, this is the fastest way to
revert a database to a known state, it's faster than dropping/recreating
databases.

Joe


>
>                         regards, tom lane
>

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Bruce Momjian
Date:
On Fri, Jul 25, 2014 at 12:10:52AM +0000, joe@tanga.com wrote:
> The following bug has been logged on the website:
>
> Bug reference:      11033
> Logged by:          Joe Van Dyk
> Email address:      joe@tanga.com
> PostgreSQL version: 9.3.5
> Operating system:   Ubuntu 12.04
> Description:
>
> I've got a small database that takes about 0.8 seconds to dump with
> 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.
>
> I've narrowed it down to the foreign key constraints in the database, if
> those are removed, then 'pg_dump -a' becomes fast again.
>
> I can't come up with an artificial test case. I can reproduce it using our
> company's db schema. I don't want to publicly post that schema to the
> mailing list, but I'd be happy to send the schema that shows the problem to
> someone privately.
>
> I can reproduce the problem in 9.3.4 and 9.4.beta1.

Wow, that is certainly odd.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Andres Freund
Date:
On 2014-10-11 17:26:59 -0400, Bruce Momjian wrote:
> On Fri, Jul 25, 2014 at 12:10:52AM +0000, joe@tanga.com wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      11033
> > Logged by:          Joe Van Dyk
> > Email address:      joe@tanga.com
> > PostgreSQL version: 9.3.5
> > Operating system:   Ubuntu 12.04
> > Description:
> >
> > I've got a small database that takes about 0.8 seconds to dump with
> > 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.
> >
> > I've narrowed it down to the foreign key constraints in the database, if
> > those are removed, then 'pg_dump -a' becomes fast again.
> >
> > I can't come up with an artificial test case. I can reproduce it using our
> > company's db schema. I don't want to publicly post that schema to the
> > mailing list, but I'd be happy to send the schema that shows the problem to
> > someone privately.
> >
> > I can reproduce the problem in 9.3.4 and 9.4.beta1.
>
> Wow, that is certainly odd.

There've been a couple cases of that where the dependency resolution
gets more complex for data only dumps because of the added dependencies
that try to get the order right to not violate foreign keys. IIRC Tom
fixed a couple performance problems recently.
Yep, 51fc6133488a80a1310972b8a0ad20aca13f5b02.

Joe, can you check with some more recent version? Unfortunately that fix
got committed after 9.3.5. 9.4 beta3 should be fine.

Andres

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Joe Van Dyk
Date:
On Sat, Oct 11, 2014 at 2:47 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

> On 2014-10-11 17:26:59 -0400, Bruce Momjian wrote:
> > On Fri, Jul 25, 2014 at 12:10:52AM +0000, joe@tanga.com wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference:      11033
> > > Logged by:          Joe Van Dyk
> > > Email address:      joe@tanga.com
> > > PostgreSQL version: 9.3.5
> > > Operating system:   Ubuntu 12.04
> > > Description:
> > >
> > > I've got a small database that takes about 0.8 seconds to dump with
> > > 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.
> > >
> > > I've narrowed it down to the foreign key constraints in the database,
> if
> > > those are removed, then 'pg_dump -a' becomes fast again.
> > >
> > > I can't come up with an artificial test case. I can reproduce it using
> our
> > > company's db schema. I don't want to publicly post that schema to the
> > > mailing list, but I'd be happy to send the schema that shows the
> problem to
> > > someone privately.
> > >
> > > I can reproduce the problem in 9.3.4 and 9.4.beta1.
> >
> > Wow, that is certainly odd.
>
> There've been a couple cases of that where the dependency resolution
> gets more complex for data only dumps because of the added dependencies
> that try to get the order right to not violate foreign keys. IIRC Tom
> fixed a couple performance problems recently.
> Yep, 51fc6133488a80a1310972b8a0ad20aca13f5b02.
>
> Joe, can you check with some more recent version? Unfortunately that fix
> got committed after 9.3.5. 9.4 beta3 should be fine.
>

I've been using the 9.3-stable branch for the past few months (as a result
of needing this fix), this problem has gone away as far as I can tell.

Joe

Re: BUG #11033: 'pg_dump -a' much slower than 'pg_dump'

From
Joe Van Dyk
Date:
On Wed, Oct 15, 2014 at 11:58 PM, Joe Van Dyk <joe@tanga.com> wrote:

> On Sat, Oct 11, 2014 at 2:47 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>
>> On 2014-10-11 17:26:59 -0400, Bruce Momjian wrote:
>> > On Fri, Jul 25, 2014 at 12:10:52AM +0000, joe@tanga.com wrote:
>> > > The following bug has been logged on the website:
>> > >
>> > > Bug reference:      11033
>> > > Logged by:          Joe Van Dyk
>> > > Email address:      joe@tanga.com
>> > > PostgreSQL version: 9.3.5
>> > > Operating system:   Ubuntu 12.04
>> > > Description:
>> > >
>> > > I've got a small database that takes about 0.8 seconds to dump with
>> > > 'pg_dump' and about 11 seconds to dump with 'pg_dump -a'.
>> > >
>> > > I've narrowed it down to the foreign key constraints in the database,
>> if
>> > > those are removed, then 'pg_dump -a' becomes fast again.
>> > >
>> > > I can't come up with an artificial test case. I can reproduce it
>> using our
>> > > company's db schema. I don't want to publicly post that schema to the
>> > > mailing list, but I'd be happy to send the schema that shows the
>> problem to
>> > > someone privately.
>> > >
>> > > I can reproduce the problem in 9.3.4 and 9.4.beta1.
>> >
>> > Wow, that is certainly odd.
>>
>> There've been a couple cases of that where the dependency resolution
>> gets more complex for data only dumps because of the added dependencies
>> that try to get the order right to not violate foreign keys. IIRC Tom
>> fixed a couple performance problems recently.
>> Yep, 51fc6133488a80a1310972b8a0ad20aca13f5b02.
>>
>> Joe, can you check with some more recent version? Unfortunately that fix
>> got committed after 9.3.5. 9.4 beta3 should be fine.
>>
>
> I've been using the 9.3-stable branch for the past few months (as a result
> of needing this fix), this problem has gone away as far as I can tell.
>
>
I forgot this never got posted to the list -- this is what Tom wrote back
in July:

"I've not entirely worked out just what's creating the performance issue,
but it seemed like contributing factors included (a) there were long
chains of FK references (not loops, just table A refers to table B refers
to table C yadda yadda), and (b) there were a lot of plain old objects,
such as functions.  The reference chains seemed to be creating a
combinatorial explosion in the number of times the search reached the
plain old objects, but I've still not quite wrapped my head around exactly
why.  It might be that there's an easier/better fix available than this
one."

Joe