Thread: pg_dump issues

pg_dump issues

From

Andrew Dunstan

Date:

01 October 2011, 21:08:10

While investigating a client problem I just observed that pg_dump takes 
a surprisingly large amount of time to dump a schema with a large number 
of views. The client's hardware is quite spiffy, and yet pg_dump is 
taking many minutes to dump a schema with some 35,000 views. Here's a 
simple test case:
   create schema views;   do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$   || i ||$$ as select
current_dateas d, current_timestamp as ts,   $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end   loop;
end;';


On my modest hardware this database took 4m18.864s for pg_dump to run. 
Should we be looking at replacing the retail operations which consume 
most of this time with something that runs faster?

There is also this gem of behaviour, which is where I started:
   p1                p2   begin;   drop view foo;                      pg_dump   commit;                      boom.

with this error:
   2011-10-01 16:38:20 EDT [27084] 30063 ERROR:  could not open   relation with OID 133640   2011-10-01 16:38:20 EDT
[27084]30064 STATEMENT:  SELECT   pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
 

Of course, this isn't caused by having a large catalog, but it's 
terrible nevertheless. I'm not sure what to do about it.

cheers

andrew

Re: pg_dump issues

From

Joe Abbate

Date:

01 October 2011, 22:51:49

On 10/01/2011 05:08 PM, Andrew Dunstan wrote:
> While investigating a client problem I just observed that pg_dump takes
> a surprisingly large amount of time to dump a schema with a large number
> of views. The client's hardware is quite spiffy, and yet pg_dump is
> taking many minutes to dump a schema with some 35,000 views. Here's a
> simple test case:
> 
>    create schema views;
>    do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
>    || i ||$$ as select current_date as d, current_timestamp as ts,
>    $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
>    loop; end;';
> 
> 
> On my modest hardware this database took 4m18.864s for pg_dump to run.
> Should we be looking at replacing the retail operations which consume
> most of this time with something that runs faster?

How modest?  Was there anything else in the database?  I tried with 9000
views (because I didn't want to bother increasing
max_locks_per_transaction) and the pg_dump in less than 10 seconds
(8.991s) redirecting (plain-text) output to a file (this is on a Core i5).

> There is also this gem of behaviour, which is where I started:
> 
>    p1                p2
>    begin;
>    drop view foo;
>                       pg_dump
>    commit;
>                       boom.
> 
> with this error:
> 
>    2011-10-01 16:38:20 EDT [27084] 30063 ERROR:  could not open
>    relation with OID 133640
>    2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT:  SELECT
>    pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
> 
> Of course, this isn't caused by having a large catalog, but it's
> terrible nevertheless. I'm not sure what to do about it.

Couldn't you run pg_dump with --lock-wait-timeout?

Joe

Re: pg_dump issues

From

Andrew Dunstan

Date:

02 October 2011, 01:46:57


On 10/01/2011 05:48 PM, Joe Abbate wrote:
> On 10/01/2011 05:08 PM, Andrew Dunstan wrote:
>> While investigating a client problem I just observed that pg_dump takes
>> a surprisingly large amount of time to dump a schema with a large number
>> of views. The client's hardware is quite spiffy, and yet pg_dump is
>> taking many minutes to dump a schema with some 35,000 views. Here's a
>> simple test case:
>>
>>     create schema views;
>>     do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
>>     || i ||$$ as select current_date as d, current_timestamp as ts,
>>     $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
>>     loop; end;';
>>
>>
>> On my modest hardware this database took 4m18.864s for pg_dump to run.
>> Should we be looking at replacing the retail operations which consume
>> most of this time with something that runs faster?
> How modest?  Was there anything else in the database?  I tried with 9000
> views (because I didn't want to bother increasing
> max_locks_per_transaction) and the pg_dump in less than 10 seconds
> (8.991s) redirecting (plain-text) output to a file (this is on a Core i5).

Yeah, it must be pretty modest :-) On more powerful h/w I get the same. 
I need to dig further into why it's taking so long to dump my client's 
schema on server class hardware.

>> There is also this gem of behaviour, which is where I started:
>>
>>     p1                p2
>>     begin;
>>     drop view foo;
>>                        pg_dump
>>     commit;
>>                        boom.
>>
>> with this error:
>>
>>     2011-10-01 16:38:20 EDT [27084] 30063 ERROR:  could not open
>>     relation with OID 133640
>>     2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT:  SELECT
>>     pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
>>
>> Of course, this isn't caused by having a large catalog, but it's
>> terrible nevertheless. I'm not sure what to do about it.
> Couldn't you run pg_dump with --lock-wait-timeout?
>

How would that help? This isn't a lock failure.


cheers

andrew

Re: pg_dump issues

From

Joe Abbate

Date:

02 October 2011, 21:16:10

Hi Andrew,

On 10/01/2011 09:46 PM, Andrew Dunstan wrote:
> 
> 
> On 10/01/2011 05:48 PM, Joe Abbate wrote:
>> On 10/01/2011 05:08 PM, Andrew Dunstan wrote:
>>> There is also this gem of behaviour, which is where I started:
>>>
>>>     p1                p2
>>>     begin;
>>>     drop view foo;
>>>                        pg_dump
>>>     commit;
>>>                        boom.
>>>
>>> with this error:
>>>
>>>     2011-10-01 16:38:20 EDT [27084] 30063 ERROR:  could not open
>>>     relation with OID 133640
>>>     2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT:  SELECT
>>>     pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
>>>
>>> Of course, this isn't caused by having a large catalog, but it's
>>> terrible nevertheless. I'm not sure what to do about it.
>> Couldn't you run pg_dump with --lock-wait-timeout?
>>
> 
> How would that help? This isn't a lock failure.

I misinterpreted the error.  I have confirmed the behavior you see.  I'm
somewhat surprised there appears to be no ability to lock a database
exclusively for something like pg_dump/pg_restore, so you're not
surprised by concurrent activity against the catalogs.  I'm guessing the
assumption is that MVCC will take care of that?

Joe

Re: pg_dump issues

From

Tom Lane

Date:

03 October 2011, 04:47:26

Andrew Dunstan <andrew@dunslane.net> writes:
> While investigating a client problem I just observed that pg_dump takes 
> a surprisingly large amount of time to dump a schema with a large number 
> of views. The client's hardware is quite spiffy, and yet pg_dump is 
> taking many minutes to dump a schema with some 35,000 views. Here's a 
> simple test case:

>     create schema views;
>     do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
>     || i ||$$ as select current_date as d, current_timestamp as ts,
>     $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
>     loop; end;';

> On my modest hardware this database took 4m18.864s for pg_dump to run. 

It takes about that on my machine too ... with --enable-cassert.
oprofile said that 90% of the runtime was going into AllocSetCheck,
so I rebuilt without cassert, and the runtime dropped to 16 seconds.
What were you testing?

(Without cassert, it looks like LockReassignCurrentOwner is the next
biggest time sink; I'm wondering if there's some sort of O(N^2) behavior
in there.)
        regards, tom lane

Re: pg_dump issues

From

Florian Pflug

Date:

03 October 2011, 08:31:14

On Oct2, 2011, at 23:15 , Joe Abbate wrote:
> I'm
> somewhat surprised there appears to be no ability to lock a database
> exclusively for something like pg_dump/pg_restore, so you're not
> surprised by concurrent activity against the catalogs.  I'm guessing the
> assumption is that MVCC will take care of that?

I think the hope is more that it will, one day. Currently, postgres internally
accesses the catalog with SnapshotNow, not with a MVCC snapshot. This is
necessary to ensure, for example, that rows inserted into a table also get
inserted into a newly created index. This wouldn't affects pg_dump if it only
access the catalog via SQL, but it doesn't. pg_dump also depends on some server-side
functions to do its work, and since these functions in turn use SnapshotNow-based
internal backend functions, pg_dump essentially uses a mix of SnapshotNow and
its transaction's MVCC snapshot.

There has been talk about reducing the use of of SnapshotNow for catalog access,
but AFAIK there's no concrete proposal, and certainly no patch, available.

best regards,
Florian Pflug

Re: pg_dump issues

From

Robert Haas

Date:

03 October 2011, 15:04:41

On Sat, Oct 1, 2011 at 9:46 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> How would that help? This isn't a lock failure.

It is, rather, a failure to lock.  Currently, LOCK TABLE only works on
tables, and pg_dump only applies it to tables.  If the offending
object had been a table rather than a view, pg_dump would (I believe)
have blocked trying to obtain an AccessShareLock against the existing
AccessExclusiveLock.  We talked about allowing locks on other types of
relations, but due to some bad syntax choices in the past it's not
completely obvious how to shoehorn that in.

http://archives.postgresql.org/pgsql-hackers/2011-06/msg00119.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_dump issues

From

Tom Lane

Date:

03 October 2011, 15:21:13

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Oct 1, 2011 at 9:46 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> How would that help? This isn't a lock failure.

> It is, rather, a failure to lock.  Currently, LOCK TABLE only works on
> tables, and pg_dump only applies it to tables.  If the offending
> object had been a table rather than a view, pg_dump would (I believe)
> have blocked trying to obtain an AccessShareLock against the existing
> AccessExclusiveLock.

Yeah, and it would still have failed once the lock was released.

Short of providing some sort of global DDL-blocking lock (with attendant
performance consequences) it's not clear how to create an entirely
bulletproof solution here.  This isn't a new problem --- we've been
aware of pg_dump's limitations in this area for many years.
        regards, tom lane

Re: pg_dump issues

From

Alvaro Herrera

Date:

03 October 2011, 15:38:40

Excerpts from Tom Lane's message of lun oct 03 01:47:18 -0300 2011:

> (Without cassert, it looks like LockReassignCurrentOwner is the next
> biggest time sink; I'm wondering if there's some sort of O(N^2) behavior
> in there.)

That seems fishy.  Even if there weren't quadratic behavior, should this
be called at all?  AFAIK it should only be used on cases where there are
subtransactions at work, and I don't think pg_dump uses them.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: pg_dump issues

From

Andrew Dunstan

Date:

03 October 2011, 15:39:20

On 10/03/2011 12:47 AM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> While investigating a client problem I just observed that pg_dump takes
>> a surprisingly large amount of time to dump a schema with a large number
>> of views. The client's hardware is quite spiffy, and yet pg_dump is
>> taking many minutes to dump a schema with some 35,000 views. Here's a
>> simple test case:
>>      create schema views;
>>      do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
>>      || i ||$$ as select current_date as d, current_timestamp as ts,
>>      $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
>>      loop; end;';
>> On my modest hardware this database took 4m18.864s for pg_dump to run.
> It takes about that on my machine too ... with --enable-cassert.
> oprofile said that 90% of the runtime was going into AllocSetCheck,
> so I rebuilt without cassert, and the runtime dropped to 16 seconds.
> What were you testing?

Yeah, you're right, that must have been it. That's a big hit, I didn't 
realise cassert was so heavy. (Note to self: test with production build 
settings). I don't seem to be batting 1000 ...

I still need to get to the bottom of why the client's machine is taking 
so long.

I do notice that we seem to be doing a lot of repetitive tasks, e.g. 
calling pg_format_type() over and over for the same arguments. Would we 
be better off cacheing that?

cheers

andrew

Re: pg_dump issues

From

Tom Lane

Date:

03 October 2011, 16:02:23

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of lun oct 03 01:47:18 -0300 2011:
>> (Without cassert, it looks like LockReassignCurrentOwner is the next
>> biggest time sink; I'm wondering if there's some sort of O(N^2) behavior
>> in there.)

> That seems fishy.  Even if there weren't quadratic behavior, should this
> be called at all?  AFAIK it should only be used on cases where there are
> subtransactions at work, and I don't think pg_dump uses them.

I wondered that too, but the calls are legit --- they're coming from
PortalDrop.

It appears that most of the calls don't actually have anything to do,
but they're iterating through a rather large local lock table to find
that out.  We probably ought to think of a way to avoid that.  The trick
is to not make performance worse for typical small transactions that
aren't holding many locks (which I think was the design center for this
to begin with).
        regards, tom lane