Thread: pg_dump issues
While investigating a client problem I just observed that pg_dump takes a surprisingly large amount of time to dump a schema with a large number of views. The client's hardware is quite spiffy, and yet pg_dump is taking many minutes to dump a schema with some 35,000 views. Here's a simple test case: create schema views; do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$ || i ||$$ as select current_dateas d, current_timestamp as ts, $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end loop; end;'; On my modest hardware this database took 4m18.864s for pg_dump to run. Should we be looking at replacing the retail operations which consume most of this time with something that runs faster? There is also this gem of behaviour, which is where I started: p1 p2 begin; drop view foo; pg_dump commit; boom. with this error: 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open relation with OID 133640 2011-10-01 16:38:20 EDT [27084]30064 STATEMENT: SELECT pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef Of course, this isn't caused by having a large catalog, but it's terrible nevertheless. I'm not sure what to do about it. cheers andrew
On 10/01/2011 05:08 PM, Andrew Dunstan wrote: > While investigating a client problem I just observed that pg_dump takes > a surprisingly large amount of time to dump a schema with a large number > of views. The client's hardware is quite spiffy, and yet pg_dump is > taking many minutes to dump a schema with some 35,000 views. Here's a > simple test case: > > create schema views; > do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$ > || i ||$$ as select current_date as d, current_timestamp as ts, > $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end > loop; end;'; > > > On my modest hardware this database took 4m18.864s for pg_dump to run. > Should we be looking at replacing the retail operations which consume > most of this time with something that runs faster? How modest? Was there anything else in the database? I tried with 9000 views (because I didn't want to bother increasing max_locks_per_transaction) and the pg_dump in less than 10 seconds (8.991s) redirecting (plain-text) output to a file (this is on a Core i5). > There is also this gem of behaviour, which is where I started: > > p1 p2 > begin; > drop view foo; > pg_dump > commit; > boom. > > with this error: > > 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open > relation with OID 133640 > 2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT > pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef > > Of course, this isn't caused by having a large catalog, but it's > terrible nevertheless. I'm not sure what to do about it. Couldn't you run pg_dump with --lock-wait-timeout? Joe
On 10/01/2011 05:48 PM, Joe Abbate wrote: > On 10/01/2011 05:08 PM, Andrew Dunstan wrote: >> While investigating a client problem I just observed that pg_dump takes >> a surprisingly large amount of time to dump a schema with a large number >> of views. The client's hardware is quite spiffy, and yet pg_dump is >> taking many minutes to dump a schema with some 35,000 views. Here's a >> simple test case: >> >> create schema views; >> do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$ >> || i ||$$ as select current_date as d, current_timestamp as ts, >> $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end >> loop; end;'; >> >> >> On my modest hardware this database took 4m18.864s for pg_dump to run. >> Should we be looking at replacing the retail operations which consume >> most of this time with something that runs faster? > How modest? Was there anything else in the database? I tried with 9000 > views (because I didn't want to bother increasing > max_locks_per_transaction) and the pg_dump in less than 10 seconds > (8.991s) redirecting (plain-text) output to a file (this is on a Core i5). Yeah, it must be pretty modest :-) On more powerful h/w I get the same. I need to dig further into why it's taking so long to dump my client's schema on server class hardware. >> There is also this gem of behaviour, which is where I started: >> >> p1 p2 >> begin; >> drop view foo; >> pg_dump >> commit; >> boom. >> >> with this error: >> >> 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open >> relation with OID 133640 >> 2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT >> pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef >> >> Of course, this isn't caused by having a large catalog, but it's >> terrible nevertheless. I'm not sure what to do about it. > Couldn't you run pg_dump with --lock-wait-timeout? > How would that help? This isn't a lock failure. cheers andrew
Hi Andrew, On 10/01/2011 09:46 PM, Andrew Dunstan wrote: > > > On 10/01/2011 05:48 PM, Joe Abbate wrote: >> On 10/01/2011 05:08 PM, Andrew Dunstan wrote: >>> There is also this gem of behaviour, which is where I started: >>> >>> p1 p2 >>> begin; >>> drop view foo; >>> pg_dump >>> commit; >>> boom. >>> >>> with this error: >>> >>> 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open >>> relation with OID 133640 >>> 2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT >>> pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef >>> >>> Of course, this isn't caused by having a large catalog, but it's >>> terrible nevertheless. I'm not sure what to do about it. >> Couldn't you run pg_dump with --lock-wait-timeout? >> > > How would that help? This isn't a lock failure. I misinterpreted the error. I have confirmed the behavior you see. I'm somewhat surprised there appears to be no ability to lock a database exclusively for something like pg_dump/pg_restore, so you're not surprised by concurrent activity against the catalogs. I'm guessing the assumption is that MVCC will take care of that? Joe
Andrew Dunstan <andrew@dunslane.net> writes: > While investigating a client problem I just observed that pg_dump takes > a surprisingly large amount of time to dump a schema with a large number > of views. The client's hardware is quite spiffy, and yet pg_dump is > taking many minutes to dump a schema with some 35,000 views. Here's a > simple test case: > create schema views; > do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$ > || i ||$$ as select current_date as d, current_timestamp as ts, > $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end > loop; end;'; > On my modest hardware this database took 4m18.864s for pg_dump to run. It takes about that on my machine too ... with --enable-cassert. oprofile said that 90% of the runtime was going into AllocSetCheck, so I rebuilt without cassert, and the runtime dropped to 16 seconds. What were you testing? (Without cassert, it looks like LockReassignCurrentOwner is the next biggest time sink; I'm wondering if there's some sort of O(N^2) behavior in there.) regards, tom lane
On Oct2, 2011, at 23:15 , Joe Abbate wrote: > I'm > somewhat surprised there appears to be no ability to lock a database > exclusively for something like pg_dump/pg_restore, so you're not > surprised by concurrent activity against the catalogs. I'm guessing the > assumption is that MVCC will take care of that? I think the hope is more that it will, one day. Currently, postgres internally accesses the catalog with SnapshotNow, not with a MVCC snapshot. This is necessary to ensure, for example, that rows inserted into a table also get inserted into a newly created index. This wouldn't affects pg_dump if it only access the catalog via SQL, but it doesn't. pg_dump also depends on some server-side functions to do its work, and since these functions in turn use SnapshotNow-based internal backend functions, pg_dump essentially uses a mix of SnapshotNow and its transaction's MVCC snapshot. There has been talk about reducing the use of of SnapshotNow for catalog access, but AFAIK there's no concrete proposal, and certainly no patch, available. best regards, Florian Pflug
On Sat, Oct 1, 2011 at 9:46 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > How would that help? This isn't a lock failure. It is, rather, a failure to lock. Currently, LOCK TABLE only works on tables, and pg_dump only applies it to tables. If the offending object had been a table rather than a view, pg_dump would (I believe) have blocked trying to obtain an AccessShareLock against the existing AccessExclusiveLock. We talked about allowing locks on other types of relations, but due to some bad syntax choices in the past it's not completely obvious how to shoehorn that in. http://archives.postgresql.org/pgsql-hackers/2011-06/msg00119.php -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Oct 1, 2011 at 9:46 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> How would that help? This isn't a lock failure. > It is, rather, a failure to lock. Currently, LOCK TABLE only works on > tables, and pg_dump only applies it to tables. If the offending > object had been a table rather than a view, pg_dump would (I believe) > have blocked trying to obtain an AccessShareLock against the existing > AccessExclusiveLock. Yeah, and it would still have failed once the lock was released. Short of providing some sort of global DDL-blocking lock (with attendant performance consequences) it's not clear how to create an entirely bulletproof solution here. This isn't a new problem --- we've been aware of pg_dump's limitations in this area for many years. regards, tom lane
Excerpts from Tom Lane's message of lun oct 03 01:47:18 -0300 2011: > (Without cassert, it looks like LockReassignCurrentOwner is the next > biggest time sink; I'm wondering if there's some sort of O(N^2) behavior > in there.) That seems fishy. Even if there weren't quadratic behavior, should this be called at all? AFAIK it should only be used on cases where there are subtransactions at work, and I don't think pg_dump uses them. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 10/03/2011 12:47 AM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> While investigating a client problem I just observed that pg_dump takes >> a surprisingly large amount of time to dump a schema with a large number >> of views. The client's hardware is quite spiffy, and yet pg_dump is >> taking many minutes to dump a schema with some 35,000 views. Here's a >> simple test case: >> create schema views; >> do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$ >> || i ||$$ as select current_date as d, current_timestamp as ts, >> $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end >> loop; end;'; >> On my modest hardware this database took 4m18.864s for pg_dump to run. > It takes about that on my machine too ... with --enable-cassert. > oprofile said that 90% of the runtime was going into AllocSetCheck, > so I rebuilt without cassert, and the runtime dropped to 16 seconds. > What were you testing? Yeah, you're right, that must have been it. That's a big hit, I didn't realise cassert was so heavy. (Note to self: test with production build settings). I don't seem to be batting 1000 ... I still need to get to the bottom of why the client's machine is taking so long. I do notice that we seem to be doing a lot of repetitive tasks, e.g. calling pg_format_type() over and over for the same arguments. Would we be better off cacheing that? cheers andrew
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of lun oct 03 01:47:18 -0300 2011: >> (Without cassert, it looks like LockReassignCurrentOwner is the next >> biggest time sink; I'm wondering if there's some sort of O(N^2) behavior >> in there.) > That seems fishy. Even if there weren't quadratic behavior, should this > be called at all? AFAIK it should only be used on cases where there are > subtransactions at work, and I don't think pg_dump uses them. I wondered that too, but the calls are legit --- they're coming from PortalDrop. It appears that most of the calls don't actually have anything to do, but they're iterating through a rather large local lock table to find that out. We probably ought to think of a way to avoid that. The trick is to not make performance worse for typical small transactions that aren't holding many locks (which I think was the design center for this to begin with). regards, tom lane