Thread: pg_dump and thousands of schemas

pg_dump and thousands of schemas

From
"Hugo "
Date:
Hi everyone,

We have a production database (postgresql 9.0) with more than 20,000 schemas
and 40Gb size. In the past we had all that information in just one schema
and pg_dump used to work just fine (2-3 hours to dump everything). Then we
decided to split the database into schemas, which makes a lot of sense for
the kind of information we store and the plans we have for the future. The
problem now is that pg_dump takes forever to finish (more than 24 hours) and
we just can't have consistent daily backups like we had in the past. When I
try to dump just one schema with almost nothing in it, it takes 12 minutes.
When I try to dump a big schema with lots of information, it takes 14
minutes. So pg_dump is clearly lost in the middle of so many schemas. The
load on the machine is low (it is a hot standby replica db) and we have good
configurations for memory, cache, shared_buffers and everything else. The
performance of the database itself is good, it is only pg_dump that is
inefficient for the task. I have found an old discussion back in 2007 that
seems to be quite related to this problem:

http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html

It seems that pg_dump hasn't been tested with a huge number of schemas like
that. Does anyone have a solution or suggestions? Do you know if there are
patches specific for this case?

Thanks in advance,
Hugo

-----
Official Nabble Administrator - we never ask for passwords.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: pg_dump and thousands of schemas

From
Craig James
Date:


On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
Hi everyone,

We have a production database (postgresql 9.0) with more than 20,000 schemas
and 40Gb size. In the past we had all that information in just one schema
and pg_dump used to work just fine (2-3 hours to dump everything). Then we
decided to split the database into schemas, which makes a lot of sense for
the kind of information we store and the plans we have for the future. The
problem now is that pg_dump takes forever to finish (more than 24 hours) and
we just can't have consistent daily backups like we had in the past. When I
try to dump just one schema with almost nothing in it, it takes 12 minutes.
When I try to dump a big schema with lots of information, it takes 14
minutes. So pg_dump is clearly lost in the middle of so many schemas. The
load on the machine is low (it is a hot standby replica db) and we have good
configurations for memory, cache, shared_buffers and everything else. The
performance of the database itself is good, it is only pg_dump that is
inefficient for the task. I have found an old discussion back in 2007 that
seems to be quite related to this problem:

http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html

It seems that pg_dump hasn't been tested with a huge number of schemas like
that. Does anyone have a solution or suggestions? Do you know if there are
patches specific for this case?

How many total relations do you have?  I don't know if there is a limit to the number of schemas, but I suspect when you went from one schema to 20,000 schemas, you also went from N relations to 20000*N relations.

Somewhere between 100,000 and 1 million total relations, Postgres starts to have trouble.  See this thread:

   http://permalink.gmane.org/gmane.comp.db.postgresql.performance/33254

(Why is it that Google can't find these archives on postgresql.org?)

Craig


Thanks in advance,
Hugo

-----
Official Nabble Administrator - we never ask for passwords.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Thu, May 24, 2012 at 8:21 AM, Craig James <cjames@emolecules.com> wrote:
>
>
> On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
>>
>> Hi everyone,
>>
>> We have a production database (postgresql 9.0) with more than 20,000
>> schemas
>> and 40Gb size. In the past we had all that information in just one schema
>> and pg_dump used to work just fine (2-3 hours to dump everything). Then we
>> decided to split the database into schemas, which makes a lot of sense for
>> the kind of information we store and the plans we have for the future. The
>> problem now is that pg_dump takes forever to finish (more than 24 hours)
>> and
>> we just can't have consistent daily backups like we had in the past. When
>> I
>> try to dump just one schema with almost nothing in it, it takes 12
>> minutes.

Sorry, your original did not show up here, so I'm piggy-backing on
Craig's reply.

Is dumping just one schema out of thousands an actual use case, or is
it just an attempt to find a faster way to dump all the schemata
through a back door?

pg_dump itself seems to have a lot of quadratic portions (plus another
one on the server which it hits pretty heavily), and it hard to know
where to start addressing them.  It seems like addressing the overall
quadratic nature might be a globally better option, but addressing
just the problem with dumping one schema might be easier to kluge
together.

>> When I try to dump a big schema with lots of information, it takes 14
>> minutes. So pg_dump is clearly lost in the middle of so many schemas. The
>> load on the machine is low (it is a hot standby replica db) and we have
>> good
>> configurations for memory, cache, shared_buffers and everything else. The
>> performance of the database itself is good, it is only pg_dump that is
>> inefficient for the task. I have found an old discussion back in 2007 that
>> seems to be quite related to this problem:
>>
>>
>> http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html
>>
>> It seems that pg_dump hasn't been tested with a huge number of schemas
>> like
>> that. Does anyone have a solution or suggestions? Do you know if there are
>> patches specific for this case?
>
>
> How many total relations do you have?  I don't know if there is a limit to
> the number of schemas, but I suspect when you went from one schema to 20,000
> schemas, you also went from N relations to 20000*N relations.

Yes, that might be important to know--whether the total number of
relations changed, or just their distribution amongst the schemata.

Cheers,

Jeff

Re: pg_dump and thousands of schemas

From
Bruce Momjian
Date:
On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote:
> On Thu, May 24, 2012 at 8:21 AM, Craig James <cjames@emolecules.com> wrote:
> >
> >
> > On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
> >>
> >> Hi everyone,
> >>
> >> We have a production database (postgresql 9.0) with more than 20,000
> >> schemas
> >> and 40Gb size. In the past we had all that information in just one schema
> >> and pg_dump used to work just fine (2-3 hours to dump everything). Then we
> >> decided to split the database into schemas, which makes a lot of sense for
> >> the kind of information we store and the plans we have for the future. The
> >> problem now is that pg_dump takes forever to finish (more than 24 hours)
> >> and
> >> we just can't have consistent daily backups like we had in the past. When
> >> I
> >> try to dump just one schema with almost nothing in it, it takes 12
> >> minutes.
>
> Sorry, your original did not show up here, so I'm piggy-backing on
> Craig's reply.
>
> Is dumping just one schema out of thousands an actual use case, or is
> it just an attempt to find a faster way to dump all the schemata
> through a back door?
>
> pg_dump itself seems to have a lot of quadratic portions (plus another
> one on the server which it hits pretty heavily), and it hard to know
> where to start addressing them.  It seems like addressing the overall
> quadratic nature might be a globally better option, but addressing
> just the problem with dumping one schema might be easier to kluge
> together.

Postgres 9.2 will have some speedups for pg_dump scanning large
databases --- that might help.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: pg_dump and thousands of schemas

From
"Hugo "
Date:
Thanks for the replies. The number of relations in the database is really
high (~500,000) and I don't think we can shrink that. The truth is that
schemas bring a lot of advantages to our system and postgresql doesn't show
signs of stress with them. So I believe it should also be possible for
pg_dump to handle them with the same elegance.

Dumping just one schema out of thousands was indeed an attempt to find a
faster way to backup the database. I don't mind creating a shell script or
program that dumps every schema individually as long as each dump is fast
enough to keep the total time within a few hours. But since each dump
currently takes at least 12 minutes, that just doesn't work. I have been
looking at the source of pg_dump in order to find possible improvements, but
this will certainly take days or even weeks. We will probably have to use
'tar' to compress the postgresql folder as the backup solution for now until
we can fix pg_dump or wait for postgresql 9.2 to become the official version
(as long as I don't need a dump and restore to upgrade the db).

If anyone has more suggestions, I would like to hear them. Thank you!

Regards,
Hugo


--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5709975.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: pg_dump and thousands of schemas

From
Ondrej Ivanič
Date:
Hi,

On 25 May 2012 14:54, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
> Thanks for the replies. The number of relations in the database is really
> high (~500,000) and I don't think we can shrink that. The truth is that
> schemas bring a lot of advantages to our system and postgresql doesn't show
> signs of stress with them. So I believe it should also be possible for
> pg_dump to handle them with the same elegance.
>
> If anyone has more suggestions, I would like to hear them. Thank you!

Maybe filesystem level backup could solve this issue:
http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-BASE-BACKUP

but keep in mind that:
- it preserves bloat in your database thus backup might need more space
- you can't restore to different PG version

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
"Hugo <Nabble>" <hugo.tech@gmail.com> writes:
> If anyone has more suggestions, I would like to hear them. Thank you!

Provide a test case?

We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
extremely specific cases that might or might not have anything to do
with what you're seeing.  The complainant was extremely helpful about
tracking down the problems:
http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php

            regards, tom lane

Re: pg_dump and thousands of schemas

From
Bruce Momjian
Date:
On Fri, May 25, 2012 at 10:41:23AM -0400, Tom Lane wrote:
> "Hugo <Nabble>" <hugo.tech@gmail.com> writes:
> > If anyone has more suggestions, I would like to hear them. Thank you!
>
> Provide a test case?
>
> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
> extremely specific cases that might or might not have anything to do
> with what you're seeing.  The complainant was extremely helpful about
> tracking down the problems:
> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php

Yes, please help us improve this!  At this point pg_upgrade is limited
by the time to dump/restore the database schema, but I can't get users
to give me any way to debug the speed problems.

Someone reported pg_upgrade took 45 minutes because of pg_dumpall
--schema, which is quite long.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote:

>> pg_dump itself seems to have a lot of quadratic portions (plus another
>> one on the server which it hits pretty heavily), and it hard to know
>> where to start addressing them.  It seems like addressing the overall
>> quadratic nature might be a globally better option, but addressing
>> just the problem with dumping one schema might be easier to kluge
>> together.
>
> Postgres 9.2 will have some speedups for pg_dump scanning large
> databases --- that might help.

Those speed ups don't seem to apply here, though.  I get the same
performance in 9.0.7 as 9.2.beta1.

There is an operation in pg_dump which is O(#_of_schemata_in_db *
#_of_table_in_db), or something like that.

The attached very crude patch reduces that to
O(log_of_#_of_schemata_in_db * #_of_table_in_db)

I was hoping this would be a general improvement.  It doesn't seem be.
 But it is a very substantial improvement in the specific case of
dumping one small schema out of a very large database.

It seems like dumping one schema would be better optimized by not
loading up the entire database catalog, but rather by restricting to
just that schema at the catalog stage.  But I haven't dug into those
details.

For dumping entire databases, It looks like the biggest problem is
going to be LockReassignCurrentOwner in the server.  And that doesn't
seem to be easy to fix, as any change to it to improve pg_dump will
risk degrading normal use cases.

If we want to be able to efficiently dump entire databases in a
scalable way, it seems like there should be some way to obtain a
data-base-wide AccessShare lock, which blocks AccessExclusive locks on
any object in the database, and turns ordinary object-level
AccessShare lock requests into no-ops.  I don't think you can get
hundreds of thousands of simultaneously held and individually recorded
AccessShare locks without causing bad things to happen.

Cheers,

Jeff

Attachment

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Fri, May 25, 2012 at 8:18 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, May 25, 2012 at 10:41:23AM -0400, Tom Lane wrote:
>> "Hugo <Nabble>" <hugo.tech@gmail.com> writes:
>> > If anyone has more suggestions, I would like to hear them. Thank you!
>>
>> Provide a test case?
>>
>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
>> extremely specific cases that might or might not have anything to do
>> with what you're seeing.  The complainant was extremely helpful about
>> tracking down the problems:
>> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
>
> Yes, please help us improve this!  At this point pg_upgrade is limited
> by the time to dump/restore the database schema, but I can't get users
> to give me any way to debug the speed problems.

For dumping one small schema from a large database, look at the time
progression of this:

dropdb foo; createdb foo;

for f in `seq 0 10000 1000000`; do
  perl -le 'print "create schema foo$_; create table foo$_.foo (k
integer, v integer);"
      foreach $ARGV[0]..$ARGV[0]+9999' $f | psql -d foo > /dev/null ;
  time pg_dump foo -Fc -n foo1 | wc -c;
done >& dump_one_schema_timing

To show the overall dump speed problem, drop the "-n foo1", and change
the step size from 10000/9999 down to 1000/999

Cheers,

Jeff

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> There is an operation in pg_dump which is O(#_of_schemata_in_db *
> #_of_table_in_db), or something like that.
> The attached very crude patch reduces that to
> O(log_of_#_of_schemata_in_db * #_of_table_in_db)

> I was hoping this would be a general improvement.  It doesn't seem be.
>  But it is a very substantial improvement in the specific case of
> dumping one small schema out of a very large database.

Your test case in
<CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA@mail.gmail.com>
shows pretty conclusively that findNamespace is a time sink for large
numbers of schemas, so that seems worth fixing.  I don't like this
patch though: we already have infrastructure for this in pg_dump,
namely buildIndexArray/findObjectByOid, so what we should do is use
that not invent something new.  I will go see about doing that.

> It seems like dumping one schema would be better optimized by not
> loading up the entire database catalog, but rather by restricting to
> just that schema at the catalog stage.

The reason pg_dump is not built that way is that considerations like
dump order dependencies are not going to work at all if it only looks
at a subset of the database.  Of course, dependency chains involving
objects not dumped might be problematic anyway, but I'd still want it
to do the best it could.

> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server.  And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.

I didn't try profiling the server side, but pg_dump doesn't use
subtransactions so it's not clear to me why LockReassignCurrentOwner
would get called at all ...

            regards, tom lane

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Fri, May 25, 2012 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>
>> For dumping entire databases, It looks like the biggest problem is
>> going to be LockReassignCurrentOwner in the server.  And that doesn't
>> seem to be easy to fix, as any change to it to improve pg_dump will
>> risk degrading normal use cases.
>
> I didn't try profiling the server side, but pg_dump doesn't use
> subtransactions so it's not clear to me why LockReassignCurrentOwner
> would get called at all ...

I thought that every select statement in a repeatable read transaction
ran in a separate "portal", and that a portal is a flavor of
subtransaction.  Anyway, it does show up at the top of a profile of
the server, so it is getting called somehow.

Cheers,

Jeff

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> For dumping entire databases, It looks like the biggest problem is
> going to be LockReassignCurrentOwner in the server.  And that doesn't
> seem to be easy to fix, as any change to it to improve pg_dump will
> risk degrading normal use cases.

> If we want to be able to efficiently dump entire databases in a
> scalable way, it seems like there should be some way to obtain a
> data-base-wide AccessShare lock, which blocks AccessExclusive locks on
> any object in the database, and turns ordinary object-level
> AccessShare lock requests into no-ops.

I thought a little bit about that, but it seems fairly unworkable.
In the first place, pg_dump doesn't necessarily want lock on every table
in the database.  In the second, such a lock mechanism would have
logical difficulties, notably whether it would be considered to apply to
tables created after the lock request occurs.  If it does, then it would
effectively block all such creations (since creation takes exclusive
locks that ought to conflict).  If it doesn't, how would you implement
that?  In any case, we'd be adding significant cost and complexity to
lock acquisition operations, for something that only whole-database
pg_dump operations could conceivably make use of.

As far as the specific problem at hand goes, I think there might be a
less invasive solution.  I poked into the behavior with gdb (and you're
right, LockReassignCurrentOwner does get called during portal drop)
and noted that although pg_dump is indeed holding thousands of locks,
any given statement that it issues touches only a few of them.  So the
loop in LockReassignCurrentOwner iterates over the whole lock table but
does something useful at only a few entries.

We could fix things for this usage pattern with what seems to me to
be a pretty low-overhead method: add a fixed-size array to
ResourceOwners in which we can remember up to N LOCALLOCKs, for N around
10 or so.  Add a LOCALLOCK to that array when we add the ResourceOwner to
that LOCALLOCK, so long as the array hasn't overflowed.  (If the array
does overflow, we mark it as overflowed and stop adding entries.)  Then,
in LockReassignCurrentOwner, we only iterate over the whole hash table
if the ResourceOwner's array has overflowed.  If it hasn't, use the
array to visit just the LOCALLOCKs that need work.

Comments?

            regards, tom lane

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Fri, May 25, 2012 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> For dumping entire databases, It looks like the biggest problem is
>> going to be LockReassignCurrentOwner in the server.  And that doesn't
>> seem to be easy to fix, as any change to it to improve pg_dump will
>> risk degrading normal use cases.
>
>> If we want to be able to efficiently dump entire databases in a
>> scalable way, it seems like there should be some way to obtain a
>> data-base-wide AccessShare lock, which blocks AccessExclusive locks on
>> any object in the database, and turns ordinary object-level
>> AccessShare lock requests into no-ops.
>
> I thought a little bit about that, but it seems fairly unworkable.
> In the first place, pg_dump doesn't necessarily want lock on every table
> in the database.

The database-wide method could be invoked only when there are no
options given to pg_dump that limit to a subset.  Or does that not
resolve the objection?

> In the second, such a lock mechanism would have
> logical difficulties, notably whether it would be considered to apply to
> tables created after the lock request occurs.  If it does, then it would
> effectively block all such creations (since creation takes exclusive
> locks that ought to conflict).

That seems acceptable to me.  With unrestricted dump, almost all other
DDL is locked out already, I don't know that locking out one more
thing is that big a deal.  Especially if there is some way to
circumvent the use of that feature.

> If it doesn't, how would you implement
> that?  In any case, we'd be adding significant cost and complexity to
> lock acquisition operations, for something that only whole-database
> pg_dump operations could conceivably make use of.

Before Robert's fast-path locks were developed, I wanted a way to put
the server into 'stable schema' mode where AccessExclusive locks were
forbidden and AccessShared were no-ops, just for performance reasons.
Now with fast-path, that might no longer be a meaningful feature.

If databases scale out a lot, won't max_locks_per_transaction, and the
amount of shared memory it would require to keep increasing it, become
a substantial problem?

> As far as the specific problem at hand goes, I think there might be a
> less invasive solution.  I poked into the behavior with gdb (and you're
> right, LockReassignCurrentOwner does get called during portal drop)
> and noted that although pg_dump is indeed holding thousands of locks,
> any given statement that it issues touches only a few of them.  So the
> loop in LockReassignCurrentOwner iterates over the whole lock table but
> does something useful at only a few entries.
>
> We could fix things for this usage pattern with what seems to me to
> be a pretty low-overhead method: add a fixed-size array to
> ResourceOwners in which we can remember up to N LOCALLOCKs, for N around
> 10 or so.

I had thought along these terms too.  I think 10 would capture most of
the gain.  with pg_dump, so far I see a huge number of resource owners
with maximum number of locks being 0, 2 or 4, and only a handful with
more than 4.  Of course I haven't looked at all use cases.

The reason we want to limit at all is not memory, but rather so that
explicitly removing locks doesn't have to dig through a large list to
find the specific one to remove, therefore become quadratic in the
case that many locks are explicitly removed, right?  Does anyone ever
add a bunch of locks, and then afterward go through and explicitly
remove them all in FIFO order?  I think most users would either remove
them LIFO, or drop them in bulk.  But better safe than sorry.

> Add a LOCALLOCK to that array when we add the ResourceOwner to
> that LOCALLOCK, so long as the array hasn't overflowed.  (If the array
> does overflow, we mark it as overflowed and stop adding entries.)  Then,
> in LockReassignCurrentOwner, we only iterate over the whole hash table
> if the ResourceOwner's array has overflowed.  If it hasn't, use the
> array to visit just the LOCALLOCKs that need work.
>
> Comments?

I have some basic parts of this already coded up.  I can try to finish
coding this up for CF next or next+1.  I'm not yet sure how to avoid
weakening the boundary between resowner.c and lock.c, my original code
was pretty ugly there, as it was just a proof of concept.

What would be a situation that might be adversely affected by the
overhead of such a change?  I think pgbench -S except implemented in a
plpgsql loop would probably do it.

Cheers,

Jeff

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> "Hugo <Nabble>" <hugo.tech@gmail.com> writes:
>> If anyone has more suggestions, I would like to hear them. Thank you!
>
> Provide a test case?
>
> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
> extremely specific cases that might or might not have anything to do
> with what you're seeing.  The complainant was extremely helpful about
> tracking down the problems:
> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php

I'm wondering if these fixes (or today's commit) include the case for
a database has ~100 thounsands of tables, indexes. One of my customers
has had troubles with pg_dump for the database, it takes over 10
hours.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
"Hugo "
Date:
Here is a sample dump that takes a long time to be written by pg_dump:
http://postgresql.1045698.n5.nabble.com/file/n5710183/test.dump.tar.gz
test.dump.tar.gz
(the file above has 2.4Mb, the dump itself has 66Mb)

This database has 2,311 schemas similar to those in my production database.
All schemas are empty, but pg_dump still takes 3 hours to finish it on my
computer. So now you can imagine my production database with more than
20,000 schemas like that. Can you guys take a look and see if the code has
room for improvements? I generated this dump with postgresql 9.1 (which is
what I have on my local computer), but my production database uses
postgresql 9.0. So it would be great if improvements could be delivered to
version 9.0 as well.

Thanks a lot for all the help!

Hugo

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5710183.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Sat, May 26, 2012 at 9:12 PM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
> Here is a sample dump that takes a long time to be written by pg_dump:
> http://postgresql.1045698.n5.nabble.com/file/n5710183/test.dump.tar.gz
> test.dump.tar.gz
> (the file above has 2.4Mb, the dump itself has 66Mb)
>
> This database has 2,311 schemas similar to those in my production database.
> All schemas are empty,

This dump does not reload cleanly.  It uses many roles which it
doesn't create.  Also, the schemata are not empty, they have about 20
tables apiece.

I created the missing roles with all default options.

Doing a default pg_dump took 66 minutes.

> but pg_dump still takes 3 hours to finish it on my
> computer. So now you can imagine my production database with more than
> 20,000 schemas like that. Can you guys take a look and see if the code has
> room for improvements?

There is a quadratic behavior in pg_dump's "mark_create_done".  This
should probably be fixed, but in the mean time it can be circumvented
by using -Fc rather than -Fp for the dump format.  Doing that removed
17 minutes from the run time.

I'm working on a patch to reduce the LockReassignCurrentOwner problem
in the server when using pg_dump with lots of objects.  Using a
preliminary version for this, in conjunction with -Fc, reduced the
dump time to 3.5 minutes.

Cheers,

Jeff

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> There is a quadratic behavior in pg_dump's "mark_create_done".  This
> should probably be fixed, but in the mean time it can be circumvented
> by using -Fc rather than -Fp for the dump format.  Doing that removed
> 17 minutes from the run time.

Hmm, that would just amount to postponing the work from pg_dump to
pg_restore --- although I suppose it could be a win if the dump is for
backup purposes and you probably won't ever have to restore it.
inhibit_data_for_failed_table() has the same issue, though perhaps it's
less likely to be exercised; and there is a previously noted O(N^2)
behavior for the loop around repoint_table_dependencies.

We could fix these things by setting up index arrays that map dump ID
to TocEntry pointer and dump ID of a table to dump ID of its TABLE DATA
TocEntry.  The first of these already exists (tocsByDumpId) but is
currently built only if doing parallel restore.  We'd have to build it
all the time to use it for fixing mark_create_done.  Still, the extra
space is small compared to the size of the TocEntry data structures,
so I don't see that that's a serious objection.

I have nothing else to do right now so am a bit tempted to go fix this.

> I'm working on a patch to reduce the LockReassignCurrentOwner problem
> in the server when using pg_dump with lots of objects.

Cool.

            regards, tom lane

Re: pg_dump and thousands of schemas

From
"Hugo "
Date:
Thanks again for the hard work, guys.

When I said that the schemas were empty, I was talking about data, not
tables. So you are right that each schema has ~20 tables (plus indices,
sequences, etc.), but pretty much no data (maybe one or two rows at most).
Data doesn't seem to be so important in this case (I may be wrong though),
so the sample database should be enough to find the weak spots that need
attention.

> but in the mean time it can be circumvented
> by using -Fc rather than -Fp for the dump format.
> Doing that removed 17 minutes from the run time.

We do use -Fc in our production server, but it doesn't help much (dump time
still > 24 hours). Actually, I tried several different dump options without
success. It seems that you guys are very close to great improvements here.
Thanks for everything!

Best,
Hugo

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5710341.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
>> extremely specific cases that might or might not have anything to do
>> with what you're seeing.  The complainant was extremely helpful about
>> tracking down the problems:
>> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
>
> I'm wondering if these fixes (or today's commit) include the case for
> a database has ~100 thounsands of tables, indexes. One of my customers
> has had troubles with pg_dump for the database, it takes over 10
> hours.

So I did qucik test with old PostgreSQL 9.0.2 and current (as of
commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
database I created 100,000 tables, and each has two integer
attributes, one of them is a primary key. Creating tables were
resonably fast as expected (18-20 minutes). This created a 1.4GB
database cluster.

pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty
long time as the customer complained. Now what was current?  Well it
took 125 minutes. Ps showed that most of time was spent in backend.

Below is the script to create tables.

cnt=100000
while [ $cnt -gt 0 ]
do
psql -e -p 5432 -c "create table t$cnt(i int primary key, j int);" test
cnt=`expr $cnt - 1`
done

p.s. You need to increate max_locks_per_transaction before running
pg_dump (I raised to 640 in my case).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Tatsuo Ishii <ishii@postgresql.org> writes:
> So I did qucik test with old PostgreSQL 9.0.2 and current (as of
> commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
> database I created 100,000 tables, and each has two integer
> attributes, one of them is a primary key. Creating tables were
> resonably fast as expected (18-20 minutes). This created a 1.4GB
> database cluster.

> pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty
> long time as the customer complained. Now what was current?  Well it
> took 125 minutes. Ps showed that most of time was spent in backend.

Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
management in the server.  What I fixed so far on the pg_dump side
should be enough to let partial dumps run at reasonable speed even if
the whole database contains many tables.  But if psql is taking
AccessShareLock on lots of tables, there's still a problem.

            regards, tom lane

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
> management in the server.  What I fixed so far on the pg_dump side
> should be enough to let partial dumps run at reasonable speed even if
> the whole database contains many tables.  But if psql is taking
> AccessShareLock on lots of tables, there's still a problem.

Yes, I saw this kind of lines:

29260 2012-05-30 09:39:19 JST LOG:  statement: LOCK TABLE public.t10 IN ACCESS SHARE MODE

It seems this is not very efficient query since LOCK TABLE can take
multiple tables as an argument and we could pass as many tables as
possible to one LOCK TABLE query. This way we could reduce the
communication between pg_dump and backend.

Also I noticed lots of queries like these:

29260 2012-05-30 09:39:19 JST LOG:  statement: SELECT attname, attacl FROM pg_catalog.pg_attribute WHERE attrelid =
'516391'AND NOT attisdropped AND attacl IS NOT NULL ORDER BY attnum 

I guess this is for each table and if there are tones of tables these
queries are major bottle neck as well as LOCK. I think we could
optimize somewhat this in that we issue queries to extract info of
multiple tables rather than extracting only one table inof as current
implementation does.

Or even better we could create a temp table which contains target
table oids to join the query above.

In my opinion, particular use case such as multi tenancy would create
tons of objects in a database cluster and the performance of pg_dump
might be highlighted more in the future.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
> management in the server.  What I fixed so far on the pg_dump side
> should be enough to let partial dumps run at reasonable speed even if
> the whole database contains many tables.  But if psql is taking
> AccessShareLock on lots of tables, there's still a problem.

Ok, I modified the part of pg_dump where tremendous number of LOCK
TABLE are issued. I replace them with single LOCK TABLE with multiple
tables. With 100k tables LOCK statements took 13 minutes in total, now
it only takes 3 seconds. Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 3461f3e..cc1ffd7 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3832,6 +3832,7 @@ getTables(Archive *fout, int *numTables)
     int            i_reloptions;
     int            i_toastreloptions;
     int            i_reloftype;
+    bool        lock_needed = false;

     /* Make sure we are in proper schema */
     selectSourceSchema(fout, "pg_catalog");
@@ -4273,15 +4274,21 @@ getTables(Archive *fout, int *numTables)
          * NOTE: it'd be kinda nice to lock other relations too, not only
          * plain tables, but the backend doesn't presently allow that.
          */
-        if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION)
+        if (i == 0)
         {
             resetPQExpBuffer(query);
-            appendPQExpBuffer(query,
-                              "LOCK TABLE %s IN ACCESS SHARE MODE",
+            appendPQExpBuffer(query,"LOCK TABLE ");
+        }
+
+        if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION)
+        {
+            if (lock_needed)
+                appendPQExpBuffer(query,",");
+            appendPQExpBuffer(query,"%s",
                          fmtQualifiedId(fout,
                                         tblinfo[i].dobj.namespace->dobj.name,
                                         tblinfo[i].dobj.name));
-            ExecuteSqlStatement(fout, query->data);
+            lock_needed = true;
         }

         /* Emit notice if join for owner failed */
@@ -4290,6 +4297,12 @@ getTables(Archive *fout, int *numTables)
                       tblinfo[i].dobj.name);
     }

+    if (lock_needed)
+    {
+        appendPQExpBuffer(query, " IN ACCESS SHARE MODE");
+        ExecuteSqlStatement(fout, query->data);
+    }
+
     if (lockWaitTimeout && fout->remoteVersion >= 70300)
     {
         ExecuteSqlStatement(fout, "SET statement_timeout = 0");

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>> management in the server.  What I fixed so far on the pg_dump side
>> should be enough to let partial dumps run at reasonable speed even if
>> the whole database contains many tables.  But if psql is taking
>> AccessShareLock on lots of tables, there's still a problem.
>
> Ok, I modified the part of pg_dump where tremendous number of LOCK
> TABLE are issued. I replace them with single LOCK TABLE with multiple
> tables. With 100k tables LOCK statements took 13 minutes in total, now
> it only takes 3 seconds. Comments?

Shall I commit to master and all supported branches?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Tatsuo Ishii <ishii@postgresql.org> writes:
>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>> it only takes 3 seconds. Comments?

> Shall I commit to master and all supported branches?

I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
the initial phase of acquiring the locks, but it does nothing for the
lock-related slowdown occurring in all pg_dump's subsequent commands.
I think we really need to get in the server-side fix that Jeff Janes is
working on, and then re-measure to see if something like this is still
worth the trouble.  I am also a tad concerned about whether we might not
have problems with parsing memory usage, or some such, with thousands of
tables being listed in a single command.

            regards, tom lane

Re: pg_dump and thousands of schemas

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Tatsuo Ishii <ishii@postgresql.org> writes:
> > Shall I commit to master and all supported branches?
>
> I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
> the initial phase of acquiring the locks, but it does nothing for the
> lock-related slowdown occurring in all pg_dump's subsequent commands.
> I think we really need to get in the server-side fix that Jeff Janes is
> working on, and then re-measure to see if something like this is still
> worth the trouble.  I am also a tad concerned about whether we might not
> have problems with parsing memory usage, or some such, with thousands of
> tables being listed in a single command.

I can't imagine a case where it's actually better to incur the latency
penalty (which is apparently on the order of *minutes* of additional
time here..) than to worry about the potential memory usage of having to
parse such a command.

If that's really a concern, where is that threshold, and could we simply
cap pg_dump's operations based on it?  Is 1000 alright?  Doing a 'lock'
w/ 1000 tables at a time is still going to be hugely better than doing
them individually and the amount of gain between every-1000 and
all-at-once is likely to be pretty minimal anyway...

The current situation where the client-to-server latency accounts for
multiple minutes of time is just ridiculous, however, so I feel we need
some form of this patch, even if the server side is magically made much
faster.  The constant back-and-forth isn't cheap.

    Thanks,

        Stephen

Attachment

Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> The current situation where the client-to-server latency accounts for
> multiple minutes of time is just ridiculous, however, so I feel we need
> some form of this patch, even if the server side is magically made much
> faster.  The constant back-and-forth isn't cheap.

No, you're missing my point.  I don't believe that client-to-server
latency, or any other O(N) cost, has anything to do with the problem
here.  The problem, as Jeff has demonstrated, is the O(N^2) costs
associated with management of the local lock table.  It is utterly
pointless to worry about O(N) costs until that's fixed; and it's just
wrong to claim that you've created a significant speedup by eliminating
a constant factor when all you've done is staved off occurrences of the
O(N^2) problem.

Once we've gotten rid of the local lock table problem, we can re-measure
and see what the true benefit of this patch is.  I'm of the opinion
that it will be in the noise compared to the overall runtime of pg_dump.
I could be wrong, but you won't convince me of that with measurements
taken while the local lock table problem is still there.

            regards, tom lane

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> I'm not excited by this patch.  It dodges the O(N^2) lock behavior for
> the initial phase of acquiring the locks, but it does nothing for the
> lock-related slowdown occurring in all pg_dump's subsequent commands.
> I think we really need to get in the server-side fix that Jeff Janes is
> working on, and then re-measure to see if something like this is still
> worth the trouble.

Well, even with current backend, locking 100,000 tables has been done
in 3 seconds in my test. So even if Jeff Janes's fix is succeeded, I
guess it will just save 3 seconds in my case. and if number of tables
is smaller, the saving will smaller. This suggests that most of time
for processing LOCK has been spent in communication between pg_dump
and backend. Of course this is just my guess, though.

> I am also a tad concerned about whether we might not
> have problems with parsing memory usage, or some such, with thousands of
> tables being listed in a single command.

That's easy to fix. Just divide each LOCK statements into multiple
LOCK statements.

My big concern is, even if the locking part is fixed (either by Jeff
Jane's fix or by me) still much time in pg_dump is spent for SELECTs
against system catalogs. The fix will be turn many SELECTs into single
SELECT, probably using big IN clause for tables oids.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
>>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered
>>> extremely specific cases that might or might not have anything to do
>>> with what you're seeing.  The complainant was extremely helpful about
>>> tracking down the problems:
>>> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php
>>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php
>>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php
>>
>> I'm wondering if these fixes (or today's commit) include the case for
>> a database has ~100 thounsands of tables, indexes. One of my customers
>> has had troubles with pg_dump for the database, it takes over 10
>> hours.
>
> So I did qucik test with old PostgreSQL 9.0.2 and current (as of
> commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed
> database I created 100,000 tables, and each has two integer
> attributes, one of them is a primary key. Creating tables were
> resonably fast as expected (18-20 minutes). This created a 1.4GB
> database cluster.
>
> pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty
> long time as the customer complained. Now what was current?  Well it
> took 125 minutes. Ps showed that most of time was spent in backend.
>
> Below is the script to create tables.
>
> cnt=100000
> while [ $cnt -gt 0 ]
> do
> psql -e -p 5432 -c "create table t$cnt(i int primary key, j int);" test
> cnt=`expr $cnt - 1`
> done
>
> p.s. You need to increate max_locks_per_transaction before running
> pg_dump (I raised to 640 in my case).

Just for record, I rerun the test again with my single-LOCK patch, and
now total runtime of pg_dump is 113 minutes.
188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).

So far, I'm glad to see 40% time savings at this point.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Robert Klemme
Date:
On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> Just for record, I rerun the test again with my single-LOCK patch, and
> now total runtime of pg_dump is 113 minutes.
> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).
>
> So far, I'm glad to see 40% time savings at this point.

I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Just for record, I rerun the test again with my single-LOCK patch, and
>> now total runtime of pg_dump is 113 minutes.
>> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).
>>
>> So far, I'm glad to see 40% time savings at this point.
>
> I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?

What I meant was (100 * (113/188 - 1)).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Robert Klemme
Date:
On Thu, May 31, 2012 at 4:07 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> Just for record, I rerun the test again with my single-LOCK patch, and
>>> now total runtime of pg_dump is 113 minutes.
>>> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch).
>>>
>>> So far, I'm glad to see 40% time savings at this point.
>>
>> I see only 9.6% savings (100 * (113/125 - 1)).  What am I missing?
>
> What I meant was (100 * (113/188 - 1)).

OK, my fault was to assume you wanted to measure only your part, while
apparently you meant overall savings.  But Tom had asked for separate
measurements if I understood him correctly.  Also, that measurement of
your change would go after the O(N^2) fix.  It could actually turn out
to be much more than 9% because the overall time would be reduced even
more dramatic.  So it might actually be good for your fix to wait a
bit. ;-)

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Re: pg_dump and thousands of schemas

From
Claudio Freire
Date:
On Thu, May 31, 2012 at 11:17 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
>
> OK, my fault was to assume you wanted to measure only your part, while
> apparently you meant overall savings.  But Tom had asked for separate
> measurements if I understood him correctly.  Also, that measurement of
> your change would go after the O(N^2) fix.  It could actually turn out
> to be much more than 9% because the overall time would be reduced even
> more dramatic.  So it might actually be good for your fix to wait a
> bit. ;-)

It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm asking because it doesn't seem like a complicated patch,
contributors may want to get working if not ;-)

Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm not; Jeff Janes is.  But you shouldn't be holding your breath
anyway, since it's 9.3 material at this point.

            regards, tom lane

Re: [HACKERS] pg_dump and thousands of schemas

From
Robert Haas
Date:
On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Claudio Freire <klaussfreire@gmail.com> writes:
>> It's not clear whether Tom is already working on that O(N^2) fix in locking.
>
> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
> anyway, since it's 9.3 material at this point.

I agree we can't back-patch that change, but then I think we ought to
consider back-patching some variant of Tatsuo's patch.  Maybe it's not
reasonable to thunk an arbitrary number of relation names in there on
one line, but how about 1000 relations per LOCK statement or so?  I
guess we'd need to see how much that erodes the benefit, but we've
certainly done back-branch rearrangements in pg_dump in the past to
fix various kinds of issues, and this is pretty non-invasive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not; Jeff Janes is. �But you shouldn't be holding your breath
>> anyway, since it's 9.3 material at this point.

> I agree we can't back-patch that change, but then I think we ought to
> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
> reasonable to thunk an arbitrary number of relation names in there on
> one line, but how about 1000 relations per LOCK statement or so?  I
> guess we'd need to see how much that erodes the benefit, but we've
> certainly done back-branch rearrangements in pg_dump in the past to
> fix various kinds of issues, and this is pretty non-invasive.

I am not convinced either that this patch will still be useful after
Jeff's fix goes in, or that it provides any meaningful savings when
you consider a complete pg_dump run.  Yeah, it will make the lock
acquisition phase faster, but that's not a big part of the runtime
except in very limited scenarios (--schema-only, perhaps).

The performance patches we applied to pg_dump over the past couple weeks
were meant to relieve pain in situations where the big server-side
lossage wasn't the dominant factor in runtime (ie, partial dumps).
But this one is targeting exactly that area, which is why it looks like
a band-aid and not a fix to me.

            regards, tom lane

Re: [HACKERS] pg_dump and thousands of schemas

From
Bruce Momjian
Date:
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm not; Jeff Janes is. �But you shouldn't be holding your breath
> >> anyway, since it's 9.3 material at this point.
>
> > I agree we can't back-patch that change, but then I think we ought to
> > consider back-patching some variant of Tatsuo's patch.  Maybe it's not
> > reasonable to thunk an arbitrary number of relation names in there on
> > one line, but how about 1000 relations per LOCK statement or so?  I
> > guess we'd need to see how much that erodes the benefit, but we've
> > certainly done back-branch rearrangements in pg_dump in the past to
> > fix various kinds of issues, and this is pretty non-invasive.
>
> I am not convinced either that this patch will still be useful after
> Jeff's fix goes in, or that it provides any meaningful savings when
> you consider a complete pg_dump run.  Yeah, it will make the lock
> acquisition phase faster, but that's not a big part of the runtime
> except in very limited scenarios (--schema-only, perhaps).

FYI, that is the pg_upgrade use-case, and pg_dump/restore time is
reportedly taking the majority of time in many cases.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: [HACKERS] pg_dump and thousands of schemas

From
Claudio Freire
Date:
On Thu, May 31, 2012 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The performance patches we applied to pg_dump over the past couple weeks
> were meant to relieve pain in situations where the big server-side
> lossage wasn't the dominant factor in runtime (ie, partial dumps).
> But this one is targeting exactly that area, which is why it looks like
> a band-aid and not a fix to me.

No, Tatsuo's patch attacks a phase dominated by latency in some
setups. That it's also becoming slow currently because of the locking
cost is irrelevant, with locking sped up, the patch should only
improve the phase even further. Imagine the current timeline:

* = locking
. = waiting

*.*.**.**.***.***.****.****.*****.****

Tatsuo's patch converts it to:

*.**************

The locking fix would turn the timeline into:

*.*.*.*.*.*.*

Tatsuo's patch would turn that into:

*******

And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade.

Re: [HACKERS] pg_dump and thousands of schemas

From
Robert Haas
Date:
On Thu, May 31, 2012 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
>>> anyway, since it's 9.3 material at this point.
>
>> I agree we can't back-patch that change, but then I think we ought to
>> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
>> reasonable to thunk an arbitrary number of relation names in there on
>> one line, but how about 1000 relations per LOCK statement or so?  I
>> guess we'd need to see how much that erodes the benefit, but we've
>> certainly done back-branch rearrangements in pg_dump in the past to
>> fix various kinds of issues, and this is pretty non-invasive.
>
> I am not convinced either that this patch will still be useful after
> Jeff's fix goes in, ...

But people on older branches are not going to GET Jeff's fix.

> or that it provides any meaningful savings when
> you consider a complete pg_dump run.  Yeah, it will make the lock
> acquisition phase faster, but that's not a big part of the runtime
> except in very limited scenarios (--schema-only, perhaps).

That is not a borderline scenario, as others have also pointed out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] pg_dump and thousands of schemas

From
Bruce Momjian
Date:
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote:
> On Thu, May 31, 2012 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> I'm not; Jeff Janes is.  But you shouldn't be holding your breath
> >>> anyway, since it's 9.3 material at this point.
> >
> >> I agree we can't back-patch that change, but then I think we ought to
> >> consider back-patching some variant of Tatsuo's patch.  Maybe it's not
> >> reasonable to thunk an arbitrary number of relation names in there on
> >> one line, but how about 1000 relations per LOCK statement or so?  I
> >> guess we'd need to see how much that erodes the benefit, but we've
> >> certainly done back-branch rearrangements in pg_dump in the past to
> >> fix various kinds of issues, and this is pretty non-invasive.
> >
> > I am not convinced either that this patch will still be useful after
> > Jeff's fix goes in, ...
>
> But people on older branches are not going to GET Jeff's fix.

FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2
would benefit because pg_upgrade uses the new cluster's pg_dumpall.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Claudio Freire <klaussfreire@gmail.com> writes:
> On Thu, May 31, 2012 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The performance patches we applied to pg_dump over the past couple weeks
>> were meant to relieve pain in situations where the big server-side
>> lossage wasn't the dominant factor in runtime (ie, partial dumps).
>> But this one is targeting exactly that area, which is why it looks like
>> a band-aid and not a fix to me.

> No, Tatsuo's patch attacks a phase dominated by latency in some
> setups.

No, it does not.  The reason it's a win is that it avoids the O(N^2)
behavior in the server.  Whether the bandwidth savings is worth worrying
about cannot be proven one way or the other as long as that elephant
is in the room.

            regards, tom lane

Re: [HACKERS] pg_dump and thousands of schemas

From
Claudio Freire
Date:
On Thu, May 31, 2012 at 12:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> No, Tatsuo's patch attacks a phase dominated by latency in some
>> setups.
>
> No, it does not.  The reason it's a win is that it avoids the O(N^2)
> behavior in the server.  Whether the bandwidth savings is worth worrying
> about cannot be proven one way or the other as long as that elephant
> is in the room.
>
>                        regards, tom lane

I understand that, but if the locking is fixed and made to be O(N)
(and hence each table locking O(1)), then latency suddenly becomes the
dominating factor.

I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump
backups, so in that case latency would be negligible and Tatsuo's
patch inconsequential.

I'm also thinking, whether the ResourceOwner patch you've proposed
would get negated by Tatsuo's patch, because suddenly a "portal"
(IIRC) has a lot more locks than ResourceOwner could accomodate,
forcing a reversal to O(N²) behavior. In that case, that patch would
in fact be detrimental... huh... way to go 180

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>> management in the server.  What I fixed so far on the pg_dump side
>> should be enough to let partial dumps run at reasonable speed even if
>> the whole database contains many tables.  But if psql is taking
>> AccessShareLock on lots of tables, there's still a problem.
>
> Ok, I modified the part of pg_dump where tremendous number of LOCK
> TABLE are issued. I replace them with single LOCK TABLE with multiple
> tables. With 100k tables LOCK statements took 13 minutes in total, now
> it only takes 3 seconds. Comments?

Could you rebase this?  I tried doing it myself, but must have messed
it up because it got slower rather than faster.

Thanks,

Jeff

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>> management in the server.  What I fixed so far on the pg_dump side
>>> should be enough to let partial dumps run at reasonable speed even if
>>> the whole database contains many tables.  But if psql is taking
>>> AccessShareLock on lots of tables, there's still a problem.
>>
>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>> it only takes 3 seconds. Comments?
>
> Could you rebase this?  I tried doing it myself, but must have messed
> it up because it got slower rather than faster.

OK, I found the problem.  In fixing a merge conflict, I had it execute
the query every time it appended a table, rather than just at the end.

With my proposed patch in place, I find that for a full default dump
your patch is slightly faster with < 300,000 tables, and slightly
slower with > 300,000.  The differences are generally <2% in either
direction.  When it comes to back-patching and partial dumps, I'm not
really sure what to test.

For the record, there is still a quadratic performance on the server,
albeit with a much smaller constant factor than the Reassign one.  It
is in get_tabstat_entry.  I don't know if is worth working on that in
isolation--if PG is going to try to accommodate 100s of thousands of
table, there probably needs to be a more general way to limit the
memory used by all aspects of the rel caches.

Cheers,

Jeff

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>>> management in the server.  What I fixed so far on the pg_dump side
>>>> should be enough to let partial dumps run at reasonable speed even if
>>>> the whole database contains many tables.  But if psql is taking
>>>> AccessShareLock on lots of tables, there's still a problem.
>>>
>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>> it only takes 3 seconds. Comments?
>>
>> Could you rebase this?  I tried doing it myself, but must have messed
>> it up because it got slower rather than faster.
>
> OK, I found the problem.  In fixing a merge conflict, I had it execute
> the query every time it appended a table, rather than just at the end.
>
> With my proposed patch in place, I find that for a full default dump
> your patch is slightly faster with < 300,000 tables, and slightly
> slower with > 300,000.  The differences are generally <2% in either
> direction.  When it comes to back-patching and partial dumps, I'm not
> really sure what to test.
>
> For the record, there is still a quadratic performance on the server,
> albeit with a much smaller constant factor than the Reassign one.  It
> is in get_tabstat_entry.  I don't know if is worth working on that in
> isolation--if PG is going to try to accommodate 100s of thousands of
> table, there probably needs to be a more general way to limit the
> memory used by all aspects of the rel caches.

I would like to test your patch and w/without my patch. Could you
please give me the patches? Or do you have your own git repository?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>>>> management in the server.  What I fixed so far on the pg_dump side
>>>>> should be enough to let partial dumps run at reasonable speed even if
>>>>> the whole database contains many tables.  But if psql is taking
>>>>> AccessShareLock on lots of tables, there's still a problem.
>>>>
>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>> it only takes 3 seconds. Comments?
>>>
>>> Could you rebase this?  I tried doing it myself, but must have messed
>>> it up because it got slower rather than faster.
>>
>> OK, I found the problem.  In fixing a merge conflict, I had it execute
>> the query every time it appended a table, rather than just at the end.
>>
>> With my proposed patch in place, I find that for a full default dump
>> your patch is slightly faster with < 300,000 tables, and slightly
>> slower with > 300,000.  The differences are generally <2% in either
>> direction.  When it comes to back-patching and partial dumps, I'm not
>> really sure what to test.
>>
>> For the record, there is still a quadratic performance on the server,
>> albeit with a much smaller constant factor than the Reassign one.  It
>> is in get_tabstat_entry.  I don't know if is worth working on that in
>> isolation--if PG is going to try to accommodate 100s of thousands of
>> table, there probably needs to be a more general way to limit the
>> memory used by all aspects of the rel caches.
>
> I would like to test your patch and w/without my patch. Could you
> please give me the patches? Or do you have your own git repository?

The main patch is in the commit fest as "Resource Owner reassign Locks
for the sake of pg_dump"

My re-basing of your patch is attached.

Cheers,

Jeff

Attachment

Re: pg_dump and thousands of schemas

From
Tatsuo Ishii
Date:
> On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
>>>>>> management in the server.  What I fixed so far on the pg_dump side
>>>>>> should be enough to let partial dumps run at reasonable speed even if
>>>>>> the whole database contains many tables.  But if psql is taking
>>>>>> AccessShareLock on lots of tables, there's still a problem.
>>>>>
>>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>>> it only takes 3 seconds. Comments?
>>>>
>>>> Could you rebase this?  I tried doing it myself, but must have messed
>>>> it up because it got slower rather than faster.
>>>
>>> OK, I found the problem.  In fixing a merge conflict, I had it execute
>>> the query every time it appended a table, rather than just at the end.
>>>
>>> With my proposed patch in place, I find that for a full default dump
>>> your patch is slightly faster with < 300,000 tables, and slightly
>>> slower with > 300,000.  The differences are generally <2% in either
>>> direction.  When it comes to back-patching and partial dumps, I'm not
>>> really sure what to test.
>>>
>>> For the record, there is still a quadratic performance on the server,
>>> albeit with a much smaller constant factor than the Reassign one.  It
>>> is in get_tabstat_entry.  I don't know if is worth working on that in
>>> isolation--if PG is going to try to accommodate 100s of thousands of
>>> table, there probably needs to be a more general way to limit the
>>> memory used by all aspects of the rel caches.
>>
>> I would like to test your patch and w/without my patch. Could you
>> please give me the patches? Or do you have your own git repository?
>
> The main patch is in the commit fest as "Resource Owner reassign Locks
> for the sake of pg_dump"
>
> My re-basing of your patch is attached.

I tested your patches with current master head. The result was pretty
good. Before it took 125 minutes (with 9.2 devel) to dump 100k empty
tables and now it takes only less than 4 minutes!

$ time pg_dump test >/dev/null

real    3m56.412s
user    0m12.059s
sys    0m3.571s

Good job!

Now I applied rebased pg_dump patch.

real    4m1.779s
user    0m11.621s
sys    0m3.052s

Unfortunately I see no improvement. Probably my patch's value is for
dumping against older backend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: pg_dump and thousands of schemas

From
"Hugo "
Date:
Hi guys,

I just want to let you know that we have created our own solution to dump
and restore our databases. The tool was written in java and the source is
now on Github (MIT license): https://github.com/tig100/JdbcPgBackup

The main approach was to cache all database objects - schemas, tables,
indexes, etc., and instead of having postgres do joins between the
pg_catalog tables (which include thousands of schemas in pg_namespace and
millions of columns in pg_attribute), we do full table scans and then find
which schema or table an object belongs to by looking it up in a hash map in
java, based on schema and table oid's. The dump is not transactionally safe,
so it should be performed on a replica db only (WAL importing disabled), not
on a live db. Some statistics:

Dump 11,000 schemas = 3 hours.
Dump 28,000 schemas = 8 hours.

You can read more about the tool on the github page.

Best regards,
Hugo




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5718532.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: pg_dump and thousands of schemas

From
Bruce Momjian
Date:
On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
> >> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock
> >> management in the server.  What I fixed so far on the pg_dump side
> >> should be enough to let partial dumps run at reasonable speed even if
> >> the whole database contains many tables.  But if psql is taking
> >> AccessShareLock on lots of tables, there's still a problem.
> >
> > Ok, I modified the part of pg_dump where tremendous number of LOCK
> > TABLE are issued. I replace them with single LOCK TABLE with multiple
> > tables. With 100k tables LOCK statements took 13 minutes in total, now
> > it only takes 3 seconds. Comments?
>
> Shall I commit to master and all supported branches?

Was this applied?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: pg_dump and thousands of schemas

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>> it only takes 3 seconds. Comments?

>> Shall I commit to master and all supported branches?

> Was this applied?

No, we fixed the server side instead.

            regards, tom lane


Re: pg_dump and thousands of schemas

From
Bruce Momjian
Date:
On Thu, Aug 30, 2012 at 04:51:56PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
> >>> Ok, I modified the part of pg_dump where tremendous number of LOCK
> >>> TABLE are issued. I replace them with single LOCK TABLE with multiple
> >>> tables. With 100k tables LOCK statements took 13 minutes in total, now
> >>> it only takes 3 seconds. Comments?
>
> >> Shall I commit to master and all supported branches?
>
> > Was this applied?
>
> No, we fixed the server side instead.

Again, thanks.  I knew we fixed the server, but wasn't clear that made
the client changes unnecessary, but I think I now do remember discussion
about that.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: [HACKERS] pg_dump and thousands of schemas

From
Robert Haas
Date:
On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>> it only takes 3 seconds. Comments?
>
>>> Shall I commit to master and all supported branches?
>
>> Was this applied?
>
> No, we fixed the server side instead.

But only for 9.2, right?  So people running back branches are still screwed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>> it only takes 3 seconds. Comments?

>>> Was this applied?

>> No, we fixed the server side instead.

> But only for 9.2, right?  So people running back branches are still screwed.

Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
behaviors involved here, not all of which are masked by what Tatsuo-san
suggested.

Six months or a year from now, we might have enough confidence in that
batch of 9.2 fixes to back-port them en masse.  Don't want to do it
today though.

            regards, tom lane


Re: [HACKERS] pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Bruce Momjian <bruce@momjian.us> writes:
>>>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>>> it only takes 3 seconds. Comments?
>
>>>> Was this applied?
>
>>> No, we fixed the server side instead.
>
>> But only for 9.2, right?  So people running back branches are still screwed.
>
> Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
> behaviors involved here, not all of which are masked by what Tatsuo-san
> suggested.

All of the other ones that I know of were associated with pg_dump
itself, and since it is recommended to run the newer version of
pg_dump against the older version of the server, no back patching
would be necessary to get the benefits of those particular fixes.

> Six months or a year from now, we might have enough confidence in that
> batch of 9.2 fixes to back-port them en masse.  Don't want to do it
> today though.


What would be the recommendation for people trying to upgrade, but who
can't get their data out in a reasonable window?

Putting Tatsuo-san's change into a future pg_dump might be more
conservative than back-porting the server's Lock Table change to the
server version they are trying to get rid of.

Cheers,

Jeff


Re: [HACKERS] pg_dump and thousands of schemas

From
Robert Haas
Date:
On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Bruce Momjian <bruce@momjian.us> writes:
>>>>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
>>>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK
>>>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple
>>>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now
>>>>>> it only takes 3 seconds. Comments?
>>
>>>>> Was this applied?
>>
>>>> No, we fixed the server side instead.
>>
>>> But only for 9.2, right?  So people running back branches are still screwed.
>>
>> Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
>> behaviors involved here, not all of which are masked by what Tatsuo-san
>> suggested.
>
> All of the other ones that I know of were associated with pg_dump
> itself, and since it is recommended to run the newer version of
> pg_dump against the older version of the server, no back patching
> would be necessary to get the benefits of those particular fixes.
>
>> Six months or a year from now, we might have enough confidence in that
>> batch of 9.2 fixes to back-port them en masse.  Don't want to do it
>> today though.
>
>
> What would be the recommendation for people trying to upgrade, but who
> can't get their data out in a reasonable window?
>
> Putting Tatsuo-san's change into a future pg_dump might be more
> conservative than back-porting the server's Lock Table change to the
> server version they are trying to get rid of.

What he said.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] pg_dump and thousands of schemas

From
Denis
Date:
I've read all the posts in thread, and as I understood in version 9.2 some
patches were applied to improve pg_dump speed. I've just installed
PostgreSQL 9.2.1 and I still have the same problem. I have a database with
2600 schemas in it. I try to dump each schema individually, but it takes too
much time for every schema (about 30-40 seconds per schema, no matter what
the data size is). Also for each schema dump I have a slow query log entry,
here is an example:

>2012-11-06 13:15:32 GMTLOG:  duration: 12029.334 ms  statement: SELECT
c.tableoid, c.oid, c.relname, c.relacl, c.relkind, c.relnamespace, (SELECT
rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname,
c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, c.relhasoids,
c.relfrozenxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid,
c.relpersistence, CASE WHEN c.reloftype <> 0 THEN
c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, d.refobjid AS
owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace
t WHERE t.oid = c.reltablespace) AS reltablespace,
array_to_string(c.reloptions, ', ') AS reloptions,
array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ',
') AS toast_reloptions FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind =
'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND
d.refclassid = c.tableoid AND d.deptype = 'a') LEFT JOIN pg_class tc ON
(c.reltoastrelid = tc.oid) WHERE c.relkind in ('r', 'S', 'v', 'c', 'f')
ORDER BY c.oid



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730864.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Denis <socsam@gmail.com> writes:
> I've read all the posts in thread, and as I understood in version 9.2 some
> patches were applied to improve pg_dump speed. I've just installed
> PostgreSQL 9.2.1 and I still have the same problem. I have a database with
> 2600 schemas in it. I try to dump each schema individually, but it takes too
> much time for every schema (about 30-40 seconds per schema, no matter what
> the data size is).

Could you provide a test case for that?  Maybe the output of pg_dump -s,
anonymized as you see fit?

> Also for each schema dump I have a slow query log entry,

Could you provide EXPLAIN ANALYZE output for that query?

            regards, tom lane


Re: [HACKERS] pg_dump and thousands of schemas

From
Denis
Date:
Tom Lane-2 wrote
> Denis <

> socsam@

> > writes:
>> I've read all the posts in thread, and as I understood in version 9.2
>> some
>> patches were applied to improve pg_dump speed. I've just installed
>> PostgreSQL 9.2.1 and I still have the same problem. I have a database
>> with
>> 2600 schemas in it. I try to dump each schema individually, but it takes
>> too
>> much time for every schema (about 30-40 seconds per schema, no matter
>> what
>> the data size is).
>
> Could you provide a test case for that?  Maybe the output of pg_dump -s,
> anonymized as you see fit?
>
>> Also for each schema dump I have a slow query log entry,
>
> Could you provide EXPLAIN ANALYZE output for that query?
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (

> pgsql-performance@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
takes from 10 to 15 seconds when I am doing backup.

Sort  (cost=853562.04..854020.73 rows=183478 width=219) (actual
time=5340.477..5405.604 rows=183924 loops=1)
   Sort Key: c.oid
   Sort Method: external merge  Disk: 33048kB
   ->  Hash Left Join  (cost=59259.80..798636.25 rows=183478 width=219)
(actual time=839.297..4971.299 rows=183924 loops=1)
         Hash Cond: (c.reltoastrelid = tc.oid)
         ->  Hash Right Join  (cost=29530.77..146976.65 rows=183478
width=183) (actual time=404.959..3261.462 rows=183924 loops=1
)
               Hash Cond: ((d.classid = c.tableoid) AND (d.objid = c.oid)
AND (d.refclassid = c.tableoid))
               Join Filter: (c.relkind = 'S'::"char")
               ->  Seq Scan on pg_depend d  (cost=0.00..71403.54 rows=995806
width=20) (actual time=1.137..878.571 rows=998642 lo
ops=1)
                     Filter: ((objsubid = 0) AND (deptype = 'a'::"char"))
                     Rows Removed by Filter: 2196665
               ->  Hash  (cost=21839.91..21839.91 rows=183478 width=175)
(actual time=402.947..402.947 rows=183924 loops=1)
                     Buckets: 1024  Batches: 32  Memory Usage: 876kB
                     ->  Seq Scan on pg_class c  (cost=0.00..21839.91
rows=183478 width=175) (actual time=0.017..267.614 rows=183
924 loops=1)
                           Filter: (relkind = ANY ('{r,S,v,c,f}'::"char"[]))
                           Rows Removed by Filter: 383565
         ->  Hash  (cost=18333.79..18333.79 rows=560979 width=40) (actual
time=434.258..434.258 rows=567489 loops=1)
               Buckets: 4096  Batches: 32  Memory Usage: 703kB
               ->  Seq Scan on pg_class tc  (cost=0.00..18333.79 rows=560979
width=40) (actual time=0.003..273.418 rows=567489 lo
ops=1)
         SubPlan 1
           ->  Seq Scan on pg_authid  (cost=0.00..1.01 rows=1 width=68)
(actual time=0.001..0.001 rows=1 loops=183924)
                 Filter: (oid = c.relowner)
                 Rows Removed by Filter: 2
         SubPlan 2
           ->  Seq Scan on pg_tablespace t  (cost=0.00..1.02 rows=1
width=64) (actual time=0.001..0.001 rows=0 loops=183924)
                 Filter: (oid = c.reltablespace)
                 Rows Removed by Filter: 2
         SubPlan 3
           ->  Function Scan on unnest x  (cost=0.00..1.25 rows=100
width=32) (actual time=0.001..0.001 rows=0 loops=183924)
 Total runtime: 5428.498 ms

Here is the output of "pg_dump -s"  test.dump
<http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump>



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730877.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Denis <socsam@gmail.com> writes:
> Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
> takes from 10 to 15 seconds when I am doing backup.

> Sort  (cost=853562.04..854020.73 rows=183478 width=219) (actual
> time=5340.477..5405.604 rows=183924 loops=1)

Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
that you've got 183924 tables.  That's going to take some time no matter
what.

It does seem like we could make some small changes to optimize that
query a little bit, but they're not going to result in any amazing
improvement overall, because pg_dump still has to deal with all the
tables it's getting back.  Fundamentally, I would ask whether you really
need so many tables.  It seems pretty likely that you have lots and lots
of basically-identical tables.  Usually it would be better to redesign
such a structure into fewer tables with more index columns.

> Here is the output of "pg_dump -s"  test.dump
> <http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump>

This dump contains only 1 schema and 43 tables, so I don't think it's
for the database you're having trouble with ...

            regards, tom lane


Re: [HACKERS] pg_dump and thousands of schemas

From
Denis
Date:
Tom Lane-2 wrote
> Denis <

> socsam@

> > writes:
>> Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
>> takes from 10 to 15 seconds when I am doing backup.
>
>> Sort  (cost=853562.04..854020.73 rows=183478 width=219) (actual
>> time=5340.477..5405.604 rows=183924 loops=1)
>
> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
> that you've got 183924 tables.  That's going to take some time no matter
> what.
>
> It does seem like we could make some small changes to optimize that
> query a little bit, but they're not going to result in any amazing
> improvement overall, because pg_dump still has to deal with all the
> tables it's getting back.  Fundamentally, I would ask whether you really
> need so many tables.  It seems pretty likely that you have lots and lots
> of basically-identical tables.  Usually it would be better to redesign
> such a structure into fewer tables with more index columns.
>
>> Here is the output of "pg_dump -s"  test.dump
>> <http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump>
>
> This dump contains only 1 schema and 43 tables, so I don't think it's
> for the database you're having trouble with ...
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (

> pgsql-performance@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

I wonder why pg_dump has to have deal with all these 183924 tables, if I
specified to dump only one scheme: "pg_dump -n schema_name" or even like
this to dump just one table "pg_dump -t 'schema_name.comments' "  ?

We have a web application where we create a schema with a number of tables
in it for each customer. This architecture was chosen to ease the process of
backup/restoring data. Sometimes clients ask us to restore data for the last
month or roll back to last week's state. This task is easy to accomplish
then the client's data is isolated in a schema/DB. If we put all the clients
data in one table - operations of this kind will be much harder to perform.
We will have to restore a huge DB with an enormously large tables in it to
find the requested data.
Different clients have different activity rate and we can select different
backup strategies according to it. This would be impossible in case we keep
all the clients data in one table.
Besides all the above mentioned, the probability of massive data corruption
(if an error in our web application occurs) is much higher.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730998.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: [HACKERS] pg_dump and thousands of schemas

From
Tom Lane
Date:
Denis <socsam@gmail.com> writes:
> Tom Lane-2 wrote
>> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
>> that you've got 183924 tables.  That's going to take some time no matter
>> what.

> I wonder why pg_dump has to have deal with all these 183924 tables, if I
> specified to dump only one scheme: "pg_dump -n schema_name" or even like
> this to dump just one table "pg_dump -t 'schema_name.comments' "  ?

It has to know about all the tables even if it's not going to dump them
all, for purposes such as dependency analysis.

> We have a web application where we create a schema with a number of tables
> in it for each customer. This architecture was chosen to ease the process of
> backup/restoring data.

I find that argument fairly dubious, but in any case you should not
imagine that hundreds of thousands of tables are going to be cost-free.

            regards, tom lane


Re: [HACKERS] pg_dump and thousands of schemas

From
Denis
Date:
Tom Lane-2 wrote
> Denis <

> socsam@

> > writes:
>> Tom Lane-2 wrote
>>> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
>>> that you've got 183924 tables.  That's going to take some time no matter
>>> what.
>
>> I wonder why pg_dump has to have deal with all these 183924 tables, if I
>> specified to dump only one scheme: "pg_dump -n schema_name" or even like
>> this to dump just one table "pg_dump -t 'schema_name.comments' "  ?
>
> It has to know about all the tables even if it's not going to dump them
> all, for purposes such as dependency analysis.
>
>> We have a web application where we create a schema with a number of
>> tables
>> in it for each customer. This architecture was chosen to ease the process
>> of
>> backup/restoring data.
>
> I find that argument fairly dubious, but in any case you should not
> imagine that hundreds of thousands of tables are going to be cost-free.
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (

> pgsql-performance@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Still I can't undesrtand why pg_dump has to know about all the tables? For
example I have such an easy table
CREATE TABLE "CLog" (
    "fromUser" integer,
    "toUser" integer,
    message character varying(2048) NOT NULL,
    "dateSend" timestamp without time zone NOT NULL
);
no foreign keys, it doesn't use partitioning, it doesn't have any relations
to any other table. Why pg_dump has to gother information about ALL the
tables in the database just to dump one this table?



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731188.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: [HACKERS] pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Thu, Nov 8, 2012 at 1:04 AM, Denis <socsam@gmail.com> wrote:
>
> Still I can't undesrtand why pg_dump has to know about all the tables?

Strictly speaking it probably doesn't need to.  But it is primarily
designed for dumping entire databases, and the efficient way to do
that is to read it all into memory in a few queries and then sort out
the dependencies, rather than tracking down every dependency
individually with one or more trips back to the database.  (Although
it still does make plenty of trips back to the database per
table/sequence, for acls, defaults, attributes.

If you were to rewrite pg_dump from the ground up to achieve your
specific needs (dumping one schema, with no dependencies between to
other schemata) you could probably make it much more efficient.  But
then it wouldn't be pg_dump, it would be something else.

Cheers,

Jeff


Re: [HACKERS] pg_dump and thousands of schemas

From
Denis
Date:
Jeff Janes wrote
> On Thu, Nov 8, 2012 at 1:04 AM, Denis <

> socsam@

> > wrote:
>>
>> Still I can't undesrtand why pg_dump has to know about all the tables?
>
> Strictly speaking it probably doesn't need to.  But it is primarily
> designed for dumping entire databases, and the efficient way to do
> that is to read it all into memory in a few queries and then sort out
> the dependencies, rather than tracking down every dependency
> individually with one or more trips back to the database.  (Although
> it still does make plenty of trips back to the database per
> table/sequence, for acls, defaults, attributes.
>
> If you were to rewrite pg_dump from the ground up to achieve your
> specific needs (dumping one schema, with no dependencies between to
> other schemata) you could probably make it much more efficient.  But
> then it wouldn't be pg_dump, it would be something else.
>
> Cheers,
>
> Jeff
>
>
> --
> Sent via pgsql-performance mailing list (

> pgsql-performance@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Please don't think that I'm trying to nitpick here, but pg_dump has options
for dumping separate tables and that's not really consistent with the idea
that "pg_dump is primarily  designed for dumping entire databases".



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731900.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: [HACKERS] pg_dump and thousands of schemas

From
Jeff Janes
Date:
On Tue, Nov 13, 2012 at 7:12 PM, Denis <socsam@gmail.com> wrote:
> Jeff Janes wrote
>> On Thu, Nov 8, 2012 at 1:04 AM, Denis <
>
>> socsam@
>
>> > wrote:
>>>
>>> Still I can't undesrtand why pg_dump has to know about all the tables?
>>
>> Strictly speaking it probably doesn't need to.  But it is primarily
>> designed for dumping entire databases, and the efficient way to do
>> that is to read it all into memory in a few queries and then sort out
>> the dependencies, rather than tracking down every dependency
>> individually with one or more trips back to the database.  (Although
>> it still does make plenty of trips back to the database per
>> table/sequence, for acls, defaults, attributes.
>>
>> If you were to rewrite pg_dump from the ground up to achieve your
>> specific needs (dumping one schema, with no dependencies between to
>> other schemata) you could probably make it much more efficient.  But
>> then it wouldn't be pg_dump, it would be something else.
>>
>
> Please don't think that I'm trying to nitpick here, but pg_dump has options
> for dumping separate tables and that's not really consistent with the idea
> that "pg_dump is primarily  designed for dumping entire databases".

I think it is compatible.  From my reading of pg_dump, those other
options seem to have been bolted on as an afterthought, not as part of
its primary design.

Cheers,

Jeff


Re: [HACKERS] pg_dump and thousands of schemas

From
Andrew Dunstan
Date:
On 11/13/2012 10:12 PM, Denis wrote:
> Please don't think that I'm trying to nitpick here, but pg_dump has options
> for dumping separate tables and that's not really consistent with the idea
> that "pg_dump is primarily  designed for dumping entire databases".
>
>


Sure it is. The word "primarily" is not just a noise word here.

The fact that we have options to do other things doesn't mean that its
primary design goal has changed.


cheers

andrew