Re: pg_dump and thousands of schemas - Mailing list pgsql-performance

From Tom Lane
Subject Re: pg_dump and thousands of schemas
Date
Msg-id 15138.1338243996@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_dump and thousands of schemas  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: pg_dump and thousands of schemas
List pgsql-performance
Jeff Janes <jeff.janes@gmail.com> writes:
> There is a quadratic behavior in pg_dump's "mark_create_done".  This
> should probably be fixed, but in the mean time it can be circumvented
> by using -Fc rather than -Fp for the dump format.  Doing that removed
> 17 minutes from the run time.

Hmm, that would just amount to postponing the work from pg_dump to
pg_restore --- although I suppose it could be a win if the dump is for
backup purposes and you probably won't ever have to restore it.
inhibit_data_for_failed_table() has the same issue, though perhaps it's
less likely to be exercised; and there is a previously noted O(N^2)
behavior for the loop around repoint_table_dependencies.

We could fix these things by setting up index arrays that map dump ID
to TocEntry pointer and dump ID of a table to dump ID of its TABLE DATA
TocEntry.  The first of these already exists (tocsByDumpId) but is
currently built only if doing parallel restore.  We'd have to build it
all the time to use it for fixing mark_create_done.  Still, the extra
space is small compared to the size of the TocEntry data structures,
so I don't see that that's a serious objection.

I have nothing else to do right now so am a bit tempted to go fix this.

> I'm working on a patch to reduce the LockReassignCurrentOwner problem
> in the server when using pg_dump with lots of objects.

Cool.

            regards, tom lane

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: pg_dump and thousands of schemas
Next
From: "Hugo "
Date:
Subject: Re: pg_dump and thousands of schemas