Re: pg_dump and thousands of schemas - Mailing list pgsql-performance

From Tom Lane
Subject Re: pg_dump and thousands of schemas
Date
Msg-id 24045.1337964977@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_dump and thousands of schemas  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: pg_dump and thousands of schemas
List pgsql-performance
Jeff Janes <jeff.janes@gmail.com> writes:
> There is an operation in pg_dump which is O(#_of_schemata_in_db *
> #_of_table_in_db), or something like that.
> The attached very crude patch reduces that to
> O(log_of_#_of_schemata_in_db * #_of_table_in_db)

> I was hoping this would be a general improvement.  It doesn't seem be.
>  But it is a very substantial improvement in the specific case of
> dumping one small schema out of a very large database.

Your test case in
<CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA@mail.gmail.com>
shows pretty conclusively that findNamespace is a time sink for large
numbers of schemas, so that seems worth fixing.  I don't like this
patch though: we already have infrastructure for this in pg_dump,
namely buildIndexArray/findObjectByOid, so what we should do is use
that not invent something new.  I will go see about doing that.

> It seems like dumping one schema would be better optimized by not
> loading up the entire database catalog, but rather by restricting to
> just that schema at the catalog stage.

The reason pg_dump is not built that way is that considerations like
dump order dependencies are not going to work at all if it only looks
at a subset of the database.  Of course, dependency chains involving
objects not dumped might be problematic anyway, but I'd still want it
to do the best it could.

> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server.  And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.

I didn't try profiling the server side, but pg_dump doesn't use
subtransactions so it's not clear to me why LockReassignCurrentOwner
would get called at all ...

            regards, tom lane

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: pg_dump and thousands of schemas
Next
From: Jeff Janes
Date:
Subject: Re: pg_dump and thousands of schemas