Re: pg_dump and thousands of schemas - Mailing list pgsql-performance

From Jeff Janes
Subject Re: pg_dump and thousands of schemas
Date
Msg-id CAMkU=1yjA-xQXnFTQVWa5jq-JPXyRdm5RXQQtADDm4T7vKhHoA@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump and thousands of schemas  (Bruce Momjian <bruce@momjian.us>)
Responses Re: pg_dump and thousands of schemas
Re: pg_dump and thousands of schemas
List pgsql-performance
On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote:

>> pg_dump itself seems to have a lot of quadratic portions (plus another
>> one on the server which it hits pretty heavily), and it hard to know
>> where to start addressing them.  It seems like addressing the overall
>> quadratic nature might be a globally better option, but addressing
>> just the problem with dumping one schema might be easier to kluge
>> together.
>
> Postgres 9.2 will have some speedups for pg_dump scanning large
> databases --- that might help.

Those speed ups don't seem to apply here, though.  I get the same
performance in 9.0.7 as 9.2.beta1.

There is an operation in pg_dump which is O(#_of_schemata_in_db *
#_of_table_in_db), or something like that.

The attached very crude patch reduces that to
O(log_of_#_of_schemata_in_db * #_of_table_in_db)

I was hoping this would be a general improvement.  It doesn't seem be.
 But it is a very substantial improvement in the specific case of
dumping one small schema out of a very large database.

It seems like dumping one schema would be better optimized by not
loading up the entire database catalog, but rather by restricting to
just that schema at the catalog stage.  But I haven't dug into those
details.

For dumping entire databases, It looks like the biggest problem is
going to be LockReassignCurrentOwner in the server.  And that doesn't
seem to be easy to fix, as any change to it to improve pg_dump will
risk degrading normal use cases.

If we want to be able to efficiently dump entire databases in a
scalable way, it seems like there should be some way to obtain a
data-base-wide AccessShare lock, which blocks AccessExclusive locks on
any object in the database, and turns ordinary object-level
AccessShare lock requests into no-ops.  I don't think you can get
hundreds of thousands of simultaneously held and individually recorded
AccessShare locks without causing bad things to happen.

Cheers,

Jeff

Attachment

pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: pg_dump and thousands of schemas
Next
From: Greg Spiegelberg
Date:
Subject: Millions of relations (from Maximum number of sequences that can be created)