Re: pg_dump and thousands of schemas - Mailing list pgsql-performance

From Jeff Janes
Subject Re: pg_dump and thousands of schemas
Date
Msg-id CAMkU=1zxd_vn5DPw9W6cKe==f=pxScbQE=cd7Gq69eO0YjGsTg@mail.gmail.com
Whole thread Raw
In response to Re: pg_dump and thousands of schemas  ("Hugo <Nabble>" <hugo.tech@gmail.com>)
Responses Re: pg_dump and thousands of schemas
List pgsql-performance
On Sat, May 26, 2012 at 9:12 PM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
> Here is a sample dump that takes a long time to be written by pg_dump:
> http://postgresql.1045698.n5.nabble.com/file/n5710183/test.dump.tar.gz
> test.dump.tar.gz
> (the file above has 2.4Mb, the dump itself has 66Mb)
>
> This database has 2,311 schemas similar to those in my production database.
> All schemas are empty,

This dump does not reload cleanly.  It uses many roles which it
doesn't create.  Also, the schemata are not empty, they have about 20
tables apiece.

I created the missing roles with all default options.

Doing a default pg_dump took 66 minutes.

> but pg_dump still takes 3 hours to finish it on my
> computer. So now you can imagine my production database with more than
> 20,000 schemas like that. Can you guys take a look and see if the code has
> room for improvements?

There is a quadratic behavior in pg_dump's "mark_create_done".  This
should probably be fixed, but in the mean time it can be circumvented
by using -Fc rather than -Fp for the dump format.  Doing that removed
17 minutes from the run time.

I'm working on a patch to reduce the LockReassignCurrentOwner problem
in the server when using pg_dump with lots of objects.  Using a
preliminary version for this, in conjunction with -Fc, reduced the
dump time to 3.5 minutes.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Alejandro Carrillo
Date:
Subject: Recover rows deleted
Next
From: Tom Lane
Date:
Subject: Re: pg_dump and thousands of schemas