Thread: pg_dump and thousands of schemas
Hi everyone, We have a production database (postgresql 9.0) with more than 20,000 schemas and 40Gb size. In the past we had all that information in just one schema and pg_dump used to work just fine (2-3 hours to dump everything). Then we decided to split the database into schemas, which makes a lot of sense for the kind of information we store and the plans we have for the future. The problem now is that pg_dump takes forever to finish (more than 24 hours) and we just can't have consistent daily backups like we had in the past. When I try to dump just one schema with almost nothing in it, it takes 12 minutes. When I try to dump a big schema with lots of information, it takes 14 minutes. So pg_dump is clearly lost in the middle of so many schemas. The load on the machine is low (it is a hot standby replica db) and we have good configurations for memory, cache, shared_buffers and everything else. The performance of the database itself is good, it is only pg_dump that is inefficient for the task. I have found an old discussion back in 2007 that seems to be quite related to this problem: http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html It seems that pg_dump hasn't been tested with a huge number of schemas like that. Does anyone have a solution or suggestions? Do you know if there are patches specific for this case? Thanks in advance, Hugo ----- Official Nabble Administrator - we never ask for passwords. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote:
How many total relations do you have? I don't know if there is a limit to the number of schemas, but I suspect when you went from one schema to 20,000 schemas, you also went from N relations to 20000*N relations.
Somewhere between 100,000 and 1 million total relations, Postgres starts to have trouble. See this thread:
http://permalink.gmane.org/gmane.comp.db.postgresql.performance/33254
(Why is it that Google can't find these archives on postgresql.org?)
Craig
Hi everyone,
We have a production database (postgresql 9.0) with more than 20,000 schemas
and 40Gb size. In the past we had all that information in just one schema
and pg_dump used to work just fine (2-3 hours to dump everything). Then we
decided to split the database into schemas, which makes a lot of sense for
the kind of information we store and the plans we have for the future. The
problem now is that pg_dump takes forever to finish (more than 24 hours) and
we just can't have consistent daily backups like we had in the past. When I
try to dump just one schema with almost nothing in it, it takes 12 minutes.
When I try to dump a big schema with lots of information, it takes 14
minutes. So pg_dump is clearly lost in the middle of so many schemas. The
load on the machine is low (it is a hot standby replica db) and we have good
configurations for memory, cache, shared_buffers and everything else. The
performance of the database itself is good, it is only pg_dump that is
inefficient for the task. I have found an old discussion back in 2007 that
seems to be quite related to this problem:
http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html
It seems that pg_dump hasn't been tested with a huge number of schemas like
that. Does anyone have a solution or suggestions? Do you know if there are
patches specific for this case?
How many total relations do you have? I don't know if there is a limit to the number of schemas, but I suspect when you went from one schema to 20,000 schemas, you also went from N relations to 20000*N relations.
Somewhere between 100,000 and 1 million total relations, Postgres starts to have trouble. See this thread:
http://permalink.gmane.org/gmane.comp.db.postgresql.performance/33254
(Why is it that Google can't find these archives on postgresql.org?)
Craig
Thanks in advance,
Hugo
-----
Official Nabble Administrator - we never ask for passwords.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
On Thu, May 24, 2012 at 8:21 AM, Craig James <cjames@emolecules.com> wrote: > > > On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: >> >> Hi everyone, >> >> We have a production database (postgresql 9.0) with more than 20,000 >> schemas >> and 40Gb size. In the past we had all that information in just one schema >> and pg_dump used to work just fine (2-3 hours to dump everything). Then we >> decided to split the database into schemas, which makes a lot of sense for >> the kind of information we store and the plans we have for the future. The >> problem now is that pg_dump takes forever to finish (more than 24 hours) >> and >> we just can't have consistent daily backups like we had in the past. When >> I >> try to dump just one schema with almost nothing in it, it takes 12 >> minutes. Sorry, your original did not show up here, so I'm piggy-backing on Craig's reply. Is dumping just one schema out of thousands an actual use case, or is it just an attempt to find a faster way to dump all the schemata through a back door? pg_dump itself seems to have a lot of quadratic portions (plus another one on the server which it hits pretty heavily), and it hard to know where to start addressing them. It seems like addressing the overall quadratic nature might be a globally better option, but addressing just the problem with dumping one schema might be easier to kluge together. >> When I try to dump a big schema with lots of information, it takes 14 >> minutes. So pg_dump is clearly lost in the middle of so many schemas. The >> load on the machine is low (it is a hot standby replica db) and we have >> good >> configurations for memory, cache, shared_buffers and everything else. The >> performance of the database itself is good, it is only pg_dump that is >> inefficient for the task. I have found an old discussion back in 2007 that >> seems to be quite related to this problem: >> >> >> http://postgresql.1045698.n5.nabble.com/5-minutes-to-pg-dump-nothing-tp1888814.html >> >> It seems that pg_dump hasn't been tested with a huge number of schemas >> like >> that. Does anyone have a solution or suggestions? Do you know if there are >> patches specific for this case? > > > How many total relations do you have? I don't know if there is a limit to > the number of schemas, but I suspect when you went from one schema to 20,000 > schemas, you also went from N relations to 20000*N relations. Yes, that might be important to know--whether the total number of relations changed, or just their distribution amongst the schemata. Cheers, Jeff
On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote: > On Thu, May 24, 2012 at 8:21 AM, Craig James <cjames@emolecules.com> wrote: > > > > > > On Thu, May 24, 2012 at 12:06 AM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: > >> > >> Hi everyone, > >> > >> We have a production database (postgresql 9.0) with more than 20,000 > >> schemas > >> and 40Gb size. In the past we had all that information in just one schema > >> and pg_dump used to work just fine (2-3 hours to dump everything). Then we > >> decided to split the database into schemas, which makes a lot of sense for > >> the kind of information we store and the plans we have for the future. The > >> problem now is that pg_dump takes forever to finish (more than 24 hours) > >> and > >> we just can't have consistent daily backups like we had in the past. When > >> I > >> try to dump just one schema with almost nothing in it, it takes 12 > >> minutes. > > Sorry, your original did not show up here, so I'm piggy-backing on > Craig's reply. > > Is dumping just one schema out of thousands an actual use case, or is > it just an attempt to find a faster way to dump all the schemata > through a back door? > > pg_dump itself seems to have a lot of quadratic portions (plus another > one on the server which it hits pretty heavily), and it hard to know > where to start addressing them. It seems like addressing the overall > quadratic nature might be a globally better option, but addressing > just the problem with dumping one schema might be easier to kluge > together. Postgres 9.2 will have some speedups for pg_dump scanning large databases --- that might help. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Thanks for the replies. The number of relations in the database is really high (~500,000) and I don't think we can shrink that. The truth is that schemas bring a lot of advantages to our system and postgresql doesn't show signs of stress with them. So I believe it should also be possible for pg_dump to handle them with the same elegance. Dumping just one schema out of thousands was indeed an attempt to find a faster way to backup the database. I don't mind creating a shell script or program that dumps every schema individually as long as each dump is fast enough to keep the total time within a few hours. But since each dump currently takes at least 12 minutes, that just doesn't work. I have been looking at the source of pg_dump in order to find possible improvements, but this will certainly take days or even weeks. We will probably have to use 'tar' to compress the postgresql folder as the backup solution for now until we can fix pg_dump or wait for postgresql 9.2 to become the official version (as long as I don't need a dump and restore to upgrade the db). If anyone has more suggestions, I would like to hear them. Thank you! Regards, Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5709975.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Hi, On 25 May 2012 14:54, Hugo <Nabble> <hugo.tech@gmail.com> wrote: > Thanks for the replies. The number of relations in the database is really > high (~500,000) and I don't think we can shrink that. The truth is that > schemas bring a lot of advantages to our system and postgresql doesn't show > signs of stress with them. So I believe it should also be possible for > pg_dump to handle them with the same elegance. > > If anyone has more suggestions, I would like to hear them. Thank you! Maybe filesystem level backup could solve this issue: http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-BASE-BACKUP but keep in mind that: - it preserves bloat in your database thus backup might need more space - you can't restore to different PG version -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
"Hugo <Nabble>" <hugo.tech@gmail.com> writes: > If anyone has more suggestions, I would like to hear them. Thank you! Provide a test case? We recently fixed a couple of O(N^2) loops in pg_dump, but those covered extremely specific cases that might or might not have anything to do with what you're seeing. The complainant was extremely helpful about tracking down the problems: http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php regards, tom lane
On Fri, May 25, 2012 at 10:41:23AM -0400, Tom Lane wrote: > "Hugo <Nabble>" <hugo.tech@gmail.com> writes: > > If anyone has more suggestions, I would like to hear them. Thank you! > > Provide a test case? > > We recently fixed a couple of O(N^2) loops in pg_dump, but those covered > extremely specific cases that might or might not have anything to do > with what you're seeing. The complainant was extremely helpful about > tracking down the problems: > http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php > http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php > http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php Yes, please help us improve this! At this point pg_upgrade is limited by the time to dump/restore the database schema, but I can't get users to give me any way to debug the speed problems. Someone reported pg_upgrade took 45 minutes because of pg_dumpall --schema, which is quite long. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, May 24, 2012 at 8:54 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, May 24, 2012 at 08:20:34PM -0700, Jeff Janes wrote: >> pg_dump itself seems to have a lot of quadratic portions (plus another >> one on the server which it hits pretty heavily), and it hard to know >> where to start addressing them. It seems like addressing the overall >> quadratic nature might be a globally better option, but addressing >> just the problem with dumping one schema might be easier to kluge >> together. > > Postgres 9.2 will have some speedups for pg_dump scanning large > databases --- that might help. Those speed ups don't seem to apply here, though. I get the same performance in 9.0.7 as 9.2.beta1. There is an operation in pg_dump which is O(#_of_schemata_in_db * #_of_table_in_db), or something like that. The attached very crude patch reduces that to O(log_of_#_of_schemata_in_db * #_of_table_in_db) I was hoping this would be a general improvement. It doesn't seem be. But it is a very substantial improvement in the specific case of dumping one small schema out of a very large database. It seems like dumping one schema would be better optimized by not loading up the entire database catalog, but rather by restricting to just that schema at the catalog stage. But I haven't dug into those details. For dumping entire databases, It looks like the biggest problem is going to be LockReassignCurrentOwner in the server. And that doesn't seem to be easy to fix, as any change to it to improve pg_dump will risk degrading normal use cases. If we want to be able to efficiently dump entire databases in a scalable way, it seems like there should be some way to obtain a data-base-wide AccessShare lock, which blocks AccessExclusive locks on any object in the database, and turns ordinary object-level AccessShare lock requests into no-ops. I don't think you can get hundreds of thousands of simultaneously held and individually recorded AccessShare locks without causing bad things to happen. Cheers, Jeff
Attachment
On Fri, May 25, 2012 at 8:18 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Fri, May 25, 2012 at 10:41:23AM -0400, Tom Lane wrote: >> "Hugo <Nabble>" <hugo.tech@gmail.com> writes: >> > If anyone has more suggestions, I would like to hear them. Thank you! >> >> Provide a test case? >> >> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered >> extremely specific cases that might or might not have anything to do >> with what you're seeing. The complainant was extremely helpful about >> tracking down the problems: >> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php >> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php >> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php > > Yes, please help us improve this! At this point pg_upgrade is limited > by the time to dump/restore the database schema, but I can't get users > to give me any way to debug the speed problems. For dumping one small schema from a large database, look at the time progression of this: dropdb foo; createdb foo; for f in `seq 0 10000 1000000`; do perl -le 'print "create schema foo$_; create table foo$_.foo (k integer, v integer);" foreach $ARGV[0]..$ARGV[0]+9999' $f | psql -d foo > /dev/null ; time pg_dump foo -Fc -n foo1 | wc -c; done >& dump_one_schema_timing To show the overall dump speed problem, drop the "-n foo1", and change the step size from 10000/9999 down to 1000/999 Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > There is an operation in pg_dump which is O(#_of_schemata_in_db * > #_of_table_in_db), or something like that. > The attached very crude patch reduces that to > O(log_of_#_of_schemata_in_db * #_of_table_in_db) > I was hoping this would be a general improvement. It doesn't seem be. > But it is a very substantial improvement in the specific case of > dumping one small schema out of a very large database. Your test case in <CAMkU=1zedM4VyLVyLuVmoekUnUXkXfnGPer+3bvPm-A_9CNYSA@mail.gmail.com> shows pretty conclusively that findNamespace is a time sink for large numbers of schemas, so that seems worth fixing. I don't like this patch though: we already have infrastructure for this in pg_dump, namely buildIndexArray/findObjectByOid, so what we should do is use that not invent something new. I will go see about doing that. > It seems like dumping one schema would be better optimized by not > loading up the entire database catalog, but rather by restricting to > just that schema at the catalog stage. The reason pg_dump is not built that way is that considerations like dump order dependencies are not going to work at all if it only looks at a subset of the database. Of course, dependency chains involving objects not dumped might be problematic anyway, but I'd still want it to do the best it could. > For dumping entire databases, It looks like the biggest problem is > going to be LockReassignCurrentOwner in the server. And that doesn't > seem to be easy to fix, as any change to it to improve pg_dump will > risk degrading normal use cases. I didn't try profiling the server side, but pg_dump doesn't use subtransactions so it's not clear to me why LockReassignCurrentOwner would get called at all ... regards, tom lane
On Fri, May 25, 2012 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: > >> For dumping entire databases, It looks like the biggest problem is >> going to be LockReassignCurrentOwner in the server. And that doesn't >> seem to be easy to fix, as any change to it to improve pg_dump will >> risk degrading normal use cases. > > I didn't try profiling the server side, but pg_dump doesn't use > subtransactions so it's not clear to me why LockReassignCurrentOwner > would get called at all ... I thought that every select statement in a repeatable read transaction ran in a separate "portal", and that a portal is a flavor of subtransaction. Anyway, it does show up at the top of a profile of the server, so it is getting called somehow. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > For dumping entire databases, It looks like the biggest problem is > going to be LockReassignCurrentOwner in the server. And that doesn't > seem to be easy to fix, as any change to it to improve pg_dump will > risk degrading normal use cases. > If we want to be able to efficiently dump entire databases in a > scalable way, it seems like there should be some way to obtain a > data-base-wide AccessShare lock, which blocks AccessExclusive locks on > any object in the database, and turns ordinary object-level > AccessShare lock requests into no-ops. I thought a little bit about that, but it seems fairly unworkable. In the first place, pg_dump doesn't necessarily want lock on every table in the database. In the second, such a lock mechanism would have logical difficulties, notably whether it would be considered to apply to tables created after the lock request occurs. If it does, then it would effectively block all such creations (since creation takes exclusive locks that ought to conflict). If it doesn't, how would you implement that? In any case, we'd be adding significant cost and complexity to lock acquisition operations, for something that only whole-database pg_dump operations could conceivably make use of. As far as the specific problem at hand goes, I think there might be a less invasive solution. I poked into the behavior with gdb (and you're right, LockReassignCurrentOwner does get called during portal drop) and noted that although pg_dump is indeed holding thousands of locks, any given statement that it issues touches only a few of them. So the loop in LockReassignCurrentOwner iterates over the whole lock table but does something useful at only a few entries. We could fix things for this usage pattern with what seems to me to be a pretty low-overhead method: add a fixed-size array to ResourceOwners in which we can remember up to N LOCALLOCKs, for N around 10 or so. Add a LOCALLOCK to that array when we add the ResourceOwner to that LOCALLOCK, so long as the array hasn't overflowed. (If the array does overflow, we mark it as overflowed and stop adding entries.) Then, in LockReassignCurrentOwner, we only iterate over the whole hash table if the ResourceOwner's array has overflowed. If it hasn't, use the array to visit just the LOCALLOCKs that need work. Comments? regards, tom lane
On Fri, May 25, 2012 at 1:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> For dumping entire databases, It looks like the biggest problem is >> going to be LockReassignCurrentOwner in the server. And that doesn't >> seem to be easy to fix, as any change to it to improve pg_dump will >> risk degrading normal use cases. > >> If we want to be able to efficiently dump entire databases in a >> scalable way, it seems like there should be some way to obtain a >> data-base-wide AccessShare lock, which blocks AccessExclusive locks on >> any object in the database, and turns ordinary object-level >> AccessShare lock requests into no-ops. > > I thought a little bit about that, but it seems fairly unworkable. > In the first place, pg_dump doesn't necessarily want lock on every table > in the database. The database-wide method could be invoked only when there are no options given to pg_dump that limit to a subset. Or does that not resolve the objection? > In the second, such a lock mechanism would have > logical difficulties, notably whether it would be considered to apply to > tables created after the lock request occurs. If it does, then it would > effectively block all such creations (since creation takes exclusive > locks that ought to conflict). That seems acceptable to me. With unrestricted dump, almost all other DDL is locked out already, I don't know that locking out one more thing is that big a deal. Especially if there is some way to circumvent the use of that feature. > If it doesn't, how would you implement > that? In any case, we'd be adding significant cost and complexity to > lock acquisition operations, for something that only whole-database > pg_dump operations could conceivably make use of. Before Robert's fast-path locks were developed, I wanted a way to put the server into 'stable schema' mode where AccessExclusive locks were forbidden and AccessShared were no-ops, just for performance reasons. Now with fast-path, that might no longer be a meaningful feature. If databases scale out a lot, won't max_locks_per_transaction, and the amount of shared memory it would require to keep increasing it, become a substantial problem? > As far as the specific problem at hand goes, I think there might be a > less invasive solution. I poked into the behavior with gdb (and you're > right, LockReassignCurrentOwner does get called during portal drop) > and noted that although pg_dump is indeed holding thousands of locks, > any given statement that it issues touches only a few of them. So the > loop in LockReassignCurrentOwner iterates over the whole lock table but > does something useful at only a few entries. > > We could fix things for this usage pattern with what seems to me to > be a pretty low-overhead method: add a fixed-size array to > ResourceOwners in which we can remember up to N LOCALLOCKs, for N around > 10 or so. I had thought along these terms too. I think 10 would capture most of the gain. with pg_dump, so far I see a huge number of resource owners with maximum number of locks being 0, 2 or 4, and only a handful with more than 4. Of course I haven't looked at all use cases. The reason we want to limit at all is not memory, but rather so that explicitly removing locks doesn't have to dig through a large list to find the specific one to remove, therefore become quadratic in the case that many locks are explicitly removed, right? Does anyone ever add a bunch of locks, and then afterward go through and explicitly remove them all in FIFO order? I think most users would either remove them LIFO, or drop them in bulk. But better safe than sorry. > Add a LOCALLOCK to that array when we add the ResourceOwner to > that LOCALLOCK, so long as the array hasn't overflowed. (If the array > does overflow, we mark it as overflowed and stop adding entries.) Then, > in LockReassignCurrentOwner, we only iterate over the whole hash table > if the ResourceOwner's array has overflowed. If it hasn't, use the > array to visit just the LOCALLOCKs that need work. > > Comments? I have some basic parts of this already coded up. I can try to finish coding this up for CF next or next+1. I'm not yet sure how to avoid weakening the boundary between resowner.c and lock.c, my original code was pretty ugly there, as it was just a proof of concept. What would be a situation that might be adversely affected by the overhead of such a change? I think pgbench -S except implemented in a plpgsql loop would probably do it. Cheers, Jeff
> "Hugo <Nabble>" <hugo.tech@gmail.com> writes: >> If anyone has more suggestions, I would like to hear them. Thank you! > > Provide a test case? > > We recently fixed a couple of O(N^2) loops in pg_dump, but those covered > extremely specific cases that might or might not have anything to do > with what you're seeing. The complainant was extremely helpful about > tracking down the problems: > http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php > http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php > http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php I'm wondering if these fixes (or today's commit) include the case for a database has ~100 thounsands of tables, indexes. One of my customers has had troubles with pg_dump for the database, it takes over 10 hours. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Here is a sample dump that takes a long time to be written by pg_dump: http://postgresql.1045698.n5.nabble.com/file/n5710183/test.dump.tar.gz test.dump.tar.gz (the file above has 2.4Mb, the dump itself has 66Mb) This database has 2,311 schemas similar to those in my production database. All schemas are empty, but pg_dump still takes 3 hours to finish it on my computer. So now you can imagine my production database with more than 20,000 schemas like that. Can you guys take a look and see if the code has room for improvements? I generated this dump with postgresql 9.1 (which is what I have on my local computer), but my production database uses postgresql 9.0. So it would be great if improvements could be delivered to version 9.0 as well. Thanks a lot for all the help! Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5710183.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Sat, May 26, 2012 at 9:12 PM, Hugo <Nabble> <hugo.tech@gmail.com> wrote: > Here is a sample dump that takes a long time to be written by pg_dump: > http://postgresql.1045698.n5.nabble.com/file/n5710183/test.dump.tar.gz > test.dump.tar.gz > (the file above has 2.4Mb, the dump itself has 66Mb) > > This database has 2,311 schemas similar to those in my production database. > All schemas are empty, This dump does not reload cleanly. It uses many roles which it doesn't create. Also, the schemata are not empty, they have about 20 tables apiece. I created the missing roles with all default options. Doing a default pg_dump took 66 minutes. > but pg_dump still takes 3 hours to finish it on my > computer. So now you can imagine my production database with more than > 20,000 schemas like that. Can you guys take a look and see if the code has > room for improvements? There is a quadratic behavior in pg_dump's "mark_create_done". This should probably be fixed, but in the mean time it can be circumvented by using -Fc rather than -Fp for the dump format. Doing that removed 17 minutes from the run time. I'm working on a patch to reduce the LockReassignCurrentOwner problem in the server when using pg_dump with lots of objects. Using a preliminary version for this, in conjunction with -Fc, reduced the dump time to 3.5 minutes. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > There is a quadratic behavior in pg_dump's "mark_create_done". This > should probably be fixed, but in the mean time it can be circumvented > by using -Fc rather than -Fp for the dump format. Doing that removed > 17 minutes from the run time. Hmm, that would just amount to postponing the work from pg_dump to pg_restore --- although I suppose it could be a win if the dump is for backup purposes and you probably won't ever have to restore it. inhibit_data_for_failed_table() has the same issue, though perhaps it's less likely to be exercised; and there is a previously noted O(N^2) behavior for the loop around repoint_table_dependencies. We could fix these things by setting up index arrays that map dump ID to TocEntry pointer and dump ID of a table to dump ID of its TABLE DATA TocEntry. The first of these already exists (tocsByDumpId) but is currently built only if doing parallel restore. We'd have to build it all the time to use it for fixing mark_create_done. Still, the extra space is small compared to the size of the TocEntry data structures, so I don't see that that's a serious objection. I have nothing else to do right now so am a bit tempted to go fix this. > I'm working on a patch to reduce the LockReassignCurrentOwner problem > in the server when using pg_dump with lots of objects. Cool. regards, tom lane
Thanks again for the hard work, guys. When I said that the schemas were empty, I was talking about data, not tables. So you are right that each schema has ~20 tables (plus indices, sequences, etc.), but pretty much no data (maybe one or two rows at most). Data doesn't seem to be so important in this case (I may be wrong though), so the sample database should be enough to find the weak spots that need attention. > but in the mean time it can be circumvented > by using -Fc rather than -Fp for the dump format. > Doing that removed 17 minutes from the run time. We do use -Fc in our production server, but it doesn't help much (dump time still > 24 hours). Actually, I tried several different dump options without success. It seems that you guys are very close to great improvements here. Thanks for everything! Best, Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5710341.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered >> extremely specific cases that might or might not have anything to do >> with what you're seeing. The complainant was extremely helpful about >> tracking down the problems: >> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php >> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php >> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php > > I'm wondering if these fixes (or today's commit) include the case for > a database has ~100 thounsands of tables, indexes. One of my customers > has had troubles with pg_dump for the database, it takes over 10 > hours. So I did qucik test with old PostgreSQL 9.0.2 and current (as of commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed database I created 100,000 tables, and each has two integer attributes, one of them is a primary key. Creating tables were resonably fast as expected (18-20 minutes). This created a 1.4GB database cluster. pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty long time as the customer complained. Now what was current? Well it took 125 minutes. Ps showed that most of time was spent in backend. Below is the script to create tables. cnt=100000 while [ $cnt -gt 0 ] do psql -e -p 5432 -c "create table t$cnt(i int primary key, j int);" test cnt=`expr $cnt - 1` done p.s. You need to increate max_locks_per_transaction before running pg_dump (I raised to 640 in my case). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: > So I did qucik test with old PostgreSQL 9.0.2 and current (as of > commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed > database I created 100,000 tables, and each has two integer > attributes, one of them is a primary key. Creating tables were > resonably fast as expected (18-20 minutes). This created a 1.4GB > database cluster. > pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty > long time as the customer complained. Now what was current? Well it > took 125 minutes. Ps showed that most of time was spent in backend. Yeah, Jeff's experiments indicated that the remaining bottleneck is lock management in the server. What I fixed so far on the pg_dump side should be enough to let partial dumps run at reasonable speed even if the whole database contains many tables. But if psql is taking AccessShareLock on lots of tables, there's still a problem. regards, tom lane
> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock > management in the server. What I fixed so far on the pg_dump side > should be enough to let partial dumps run at reasonable speed even if > the whole database contains many tables. But if psql is taking > AccessShareLock on lots of tables, there's still a problem. Yes, I saw this kind of lines: 29260 2012-05-30 09:39:19 JST LOG: statement: LOCK TABLE public.t10 IN ACCESS SHARE MODE It seems this is not very efficient query since LOCK TABLE can take multiple tables as an argument and we could pass as many tables as possible to one LOCK TABLE query. This way we could reduce the communication between pg_dump and backend. Also I noticed lots of queries like these: 29260 2012-05-30 09:39:19 JST LOG: statement: SELECT attname, attacl FROM pg_catalog.pg_attribute WHERE attrelid = '516391'AND NOT attisdropped AND attacl IS NOT NULL ORDER BY attnum I guess this is for each table and if there are tones of tables these queries are major bottle neck as well as LOCK. I think we could optimize somewhat this in that we issue queries to extract info of multiple tables rather than extracting only one table inof as current implementation does. Or even better we could create a temp table which contains target table oids to join the query above. In my opinion, particular use case such as multi tenancy would create tons of objects in a database cluster and the performance of pg_dump might be highlighted more in the future. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock > management in the server. What I fixed so far on the pg_dump side > should be enough to let partial dumps run at reasonable speed even if > the whole database contains many tables. But if psql is taking > AccessShareLock on lots of tables, there's still a problem. Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3461f3e..cc1ffd7 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3832,6 +3832,7 @@ getTables(Archive *fout, int *numTables) int i_reloptions; int i_toastreloptions; int i_reloftype; + bool lock_needed = false; /* Make sure we are in proper schema */ selectSourceSchema(fout, "pg_catalog"); @@ -4273,15 +4274,21 @@ getTables(Archive *fout, int *numTables) * NOTE: it'd be kinda nice to lock other relations too, not only * plain tables, but the backend doesn't presently allow that. */ - if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION) + if (i == 0) { resetPQExpBuffer(query); - appendPQExpBuffer(query, - "LOCK TABLE %s IN ACCESS SHARE MODE", + appendPQExpBuffer(query,"LOCK TABLE "); + } + + if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION) + { + if (lock_needed) + appendPQExpBuffer(query,","); + appendPQExpBuffer(query,"%s", fmtQualifiedId(fout, tblinfo[i].dobj.namespace->dobj.name, tblinfo[i].dobj.name)); - ExecuteSqlStatement(fout, query->data); + lock_needed = true; } /* Emit notice if join for owner failed */ @@ -4290,6 +4297,12 @@ getTables(Archive *fout, int *numTables) tblinfo[i].dobj.name); } + if (lock_needed) + { + appendPQExpBuffer(query, " IN ACCESS SHARE MODE"); + ExecuteSqlStatement(fout, query->data); + } + if (lockWaitTimeout && fout->remoteVersion >= 70300) { ExecuteSqlStatement(fout, "SET statement_timeout = 0");
>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock >> management in the server. What I fixed so far on the pg_dump side >> should be enough to let partial dumps run at reasonable speed even if >> the whole database contains many tables. But if psql is taking >> AccessShareLock on lots of tables, there's still a problem. > > Ok, I modified the part of pg_dump where tremendous number of LOCK > TABLE are issued. I replace them with single LOCK TABLE with multiple > tables. With 100k tables LOCK statements took 13 minutes in total, now > it only takes 3 seconds. Comments? Shall I commit to master and all supported branches? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: >> Ok, I modified the part of pg_dump where tremendous number of LOCK >> TABLE are issued. I replace them with single LOCK TABLE with multiple >> tables. With 100k tables LOCK statements took 13 minutes in total, now >> it only takes 3 seconds. Comments? > Shall I commit to master and all supported branches? I'm not excited by this patch. It dodges the O(N^2) lock behavior for the initial phase of acquiring the locks, but it does nothing for the lock-related slowdown occurring in all pg_dump's subsequent commands. I think we really need to get in the server-side fix that Jeff Janes is working on, and then re-measure to see if something like this is still worth the trouble. I am also a tad concerned about whether we might not have problems with parsing memory usage, or some such, with thousands of tables being listed in a single command. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Tatsuo Ishii <ishii@postgresql.org> writes: > > Shall I commit to master and all supported branches? > > I'm not excited by this patch. It dodges the O(N^2) lock behavior for > the initial phase of acquiring the locks, but it does nothing for the > lock-related slowdown occurring in all pg_dump's subsequent commands. > I think we really need to get in the server-side fix that Jeff Janes is > working on, and then re-measure to see if something like this is still > worth the trouble. I am also a tad concerned about whether we might not > have problems with parsing memory usage, or some such, with thousands of > tables being listed in a single command. I can't imagine a case where it's actually better to incur the latency penalty (which is apparently on the order of *minutes* of additional time here..) than to worry about the potential memory usage of having to parse such a command. If that's really a concern, where is that threshold, and could we simply cap pg_dump's operations based on it? Is 1000 alright? Doing a 'lock' w/ 1000 tables at a time is still going to be hugely better than doing them individually and the amount of gain between every-1000 and all-at-once is likely to be pretty minimal anyway... The current situation where the client-to-server latency accounts for multiple minutes of time is just ridiculous, however, so I feel we need some form of this patch, even if the server side is magically made much faster. The constant back-and-forth isn't cheap. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > The current situation where the client-to-server latency accounts for > multiple minutes of time is just ridiculous, however, so I feel we need > some form of this patch, even if the server side is magically made much > faster. The constant back-and-forth isn't cheap. No, you're missing my point. I don't believe that client-to-server latency, or any other O(N) cost, has anything to do with the problem here. The problem, as Jeff has demonstrated, is the O(N^2) costs associated with management of the local lock table. It is utterly pointless to worry about O(N) costs until that's fixed; and it's just wrong to claim that you've created a significant speedup by eliminating a constant factor when all you've done is staved off occurrences of the O(N^2) problem. Once we've gotten rid of the local lock table problem, we can re-measure and see what the true benefit of this patch is. I'm of the opinion that it will be in the noise compared to the overall runtime of pg_dump. I could be wrong, but you won't convince me of that with measurements taken while the local lock table problem is still there. regards, tom lane
> I'm not excited by this patch. It dodges the O(N^2) lock behavior for > the initial phase of acquiring the locks, but it does nothing for the > lock-related slowdown occurring in all pg_dump's subsequent commands. > I think we really need to get in the server-side fix that Jeff Janes is > working on, and then re-measure to see if something like this is still > worth the trouble. Well, even with current backend, locking 100,000 tables has been done in 3 seconds in my test. So even if Jeff Janes's fix is succeeded, I guess it will just save 3 seconds in my case. and if number of tables is smaller, the saving will smaller. This suggests that most of time for processing LOCK has been spent in communication between pg_dump and backend. Of course this is just my guess, though. > I am also a tad concerned about whether we might not > have problems with parsing memory usage, or some such, with thousands of > tables being listed in a single command. That's easy to fix. Just divide each LOCK statements into multiple LOCK statements. My big concern is, even if the locking part is fixed (either by Jeff Jane's fix or by me) still much time in pg_dump is spent for SELECTs against system catalogs. The fix will be turn many SELECTs into single SELECT, probably using big IN clause for tables oids. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
>>> We recently fixed a couple of O(N^2) loops in pg_dump, but those covered >>> extremely specific cases that might or might not have anything to do >>> with what you're seeing. The complainant was extremely helpful about >>> tracking down the problems: >>> http://archives.postgresql.org/pgsql-general/2012-03/msg00957.php >>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00225.php >>> http://archives.postgresql.org/pgsql-committers/2012-03/msg00230.php >> >> I'm wondering if these fixes (or today's commit) include the case for >> a database has ~100 thounsands of tables, indexes. One of my customers >> has had troubles with pg_dump for the database, it takes over 10 >> hours. > > So I did qucik test with old PostgreSQL 9.0.2 and current (as of > commit 2755abf386e6572bad15cb6a032e504ad32308cc). In a fresh initdb-ed > database I created 100,000 tables, and each has two integer > attributes, one of them is a primary key. Creating tables were > resonably fast as expected (18-20 minutes). This created a 1.4GB > database cluster. > > pg_dump dbname >/dev/null took 188 minutes on 9.0.2, which was pretty > long time as the customer complained. Now what was current? Well it > took 125 minutes. Ps showed that most of time was spent in backend. > > Below is the script to create tables. > > cnt=100000 > while [ $cnt -gt 0 ] > do > psql -e -p 5432 -c "create table t$cnt(i int primary key, j int);" test > cnt=`expr $cnt - 1` > done > > p.s. You need to increate max_locks_per_transaction before running > pg_dump (I raised to 640 in my case). Just for record, I rerun the test again with my single-LOCK patch, and now total runtime of pg_dump is 113 minutes. 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). So far, I'm glad to see 40% time savings at this point. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: > Just for record, I rerun the test again with my single-LOCK patch, and > now total runtime of pg_dump is 113 minutes. > 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). > > So far, I'm glad to see 40% time savings at this point. I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? Cheers robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Just for record, I rerun the test again with my single-LOCK patch, and >> now total runtime of pg_dump is 113 minutes. >> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). >> >> So far, I'm glad to see 40% time savings at this point. > > I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? What I meant was (100 * (113/188 - 1)). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Thu, May 31, 2012 at 4:07 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> On Thu, May 31, 2012 at 10:45 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> Just for record, I rerun the test again with my single-LOCK patch, and >>> now total runtime of pg_dump is 113 minutes. >>> 188 minutes(9.0)->125 minutes(git master)->113 minutes(with my patch). >>> >>> So far, I'm glad to see 40% time savings at this point. >> >> I see only 9.6% savings (100 * (113/125 - 1)). What am I missing? > > What I meant was (100 * (113/188 - 1)). OK, my fault was to assume you wanted to measure only your part, while apparently you meant overall savings. But Tom had asked for separate measurements if I understood him correctly. Also, that measurement of your change would go after the O(N^2) fix. It could actually turn out to be much more than 9% because the overall time would be reduced even more dramatic. So it might actually be good for your fix to wait a bit. ;-) Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
On Thu, May 31, 2012 at 11:17 AM, Robert Klemme <shortcutter@googlemail.com> wrote: > > OK, my fault was to assume you wanted to measure only your part, while > apparently you meant overall savings. But Tom had asked for separate > measurements if I understood him correctly. Also, that measurement of > your change would go after the O(N^2) fix. It could actually turn out > to be much more than 9% because the overall time would be reduced even > more dramatic. So it might actually be good for your fix to wait a > bit. ;-) It's not clear whether Tom is already working on that O(N^2) fix in locking. I'm asking because it doesn't seem like a complicated patch, contributors may want to get working if not ;-)
Claudio Freire <klaussfreire@gmail.com> writes: > It's not clear whether Tom is already working on that O(N^2) fix in locking. I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. regards, tom lane
On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Claudio Freire <klaussfreire@gmail.com> writes: >> It's not clear whether Tom is already working on that O(N^2) fix in locking. > > I'm not; Jeff Janes is. But you shouldn't be holding your breath > anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not; Jeff Janes is. �But you shouldn't be holding your breath >> anyway, since it's 9.3 material at this point. > I agree we can't back-patch that change, but then I think we ought to > consider back-patching some variant of Tatsuo's patch. Maybe it's not > reasonable to thunk an arbitrary number of relation names in there on > one line, but how about 1000 relations per LOCK statement or so? I > guess we'd need to see how much that erodes the benefit, but we've > certainly done back-branch rearrangements in pg_dump in the past to > fix various kinds of issues, and this is pretty non-invasive. I am not convinced either that this patch will still be useful after Jeff's fix goes in, or that it provides any meaningful savings when you consider a complete pg_dump run. Yeah, it will make the lock acquisition phase faster, but that's not a big part of the runtime except in very limited scenarios (--schema-only, perhaps). The performance patches we applied to pg_dump over the past couple weeks were meant to relieve pain in situations where the big server-side lossage wasn't the dominant factor in runtime (ie, partial dumps). But this one is targeting exactly that area, which is why it looks like a band-aid and not a fix to me. regards, tom lane
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I'm not; Jeff Janes is. �But you shouldn't be holding your breath > >> anyway, since it's 9.3 material at this point. > > > I agree we can't back-patch that change, but then I think we ought to > > consider back-patching some variant of Tatsuo's patch. Maybe it's not > > reasonable to thunk an arbitrary number of relation names in there on > > one line, but how about 1000 relations per LOCK statement or so? I > > guess we'd need to see how much that erodes the benefit, but we've > > certainly done back-branch rearrangements in pg_dump in the past to > > fix various kinds of issues, and this is pretty non-invasive. > > I am not convinced either that this patch will still be useful after > Jeff's fix goes in, or that it provides any meaningful savings when > you consider a complete pg_dump run. Yeah, it will make the lock > acquisition phase faster, but that's not a big part of the runtime > except in very limited scenarios (--schema-only, perhaps). FYI, that is the pg_upgrade use-case, and pg_dump/restore time is reportedly taking the majority of time in many cases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, May 31, 2012 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The performance patches we applied to pg_dump over the past couple weeks > were meant to relieve pain in situations where the big server-side > lossage wasn't the dominant factor in runtime (ie, partial dumps). > But this one is targeting exactly that area, which is why it looks like > a band-aid and not a fix to me. No, Tatsuo's patch attacks a phase dominated by latency in some setups. That it's also becoming slow currently because of the locking cost is irrelevant, with locking sped up, the patch should only improve the phase even further. Imagine the current timeline: * = locking . = waiting *.*.**.**.***.***.****.****.*****.**** Tatsuo's patch converts it to: *.************** The locking fix would turn the timeline into: *.*.*.*.*.*.* Tatsuo's patch would turn that into: ******* And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade.
On Thu, May 31, 2012 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm not; Jeff Janes is. But you shouldn't be holding your breath >>> anyway, since it's 9.3 material at this point. > >> I agree we can't back-patch that change, but then I think we ought to >> consider back-patching some variant of Tatsuo's patch. Maybe it's not >> reasonable to thunk an arbitrary number of relation names in there on >> one line, but how about 1000 relations per LOCK statement or so? I >> guess we'd need to see how much that erodes the benefit, but we've >> certainly done back-branch rearrangements in pg_dump in the past to >> fix various kinds of issues, and this is pretty non-invasive. > > I am not convinced either that this patch will still be useful after > Jeff's fix goes in, ... But people on older branches are not going to GET Jeff's fix. > or that it provides any meaningful savings when > you consider a complete pg_dump run. Yeah, it will make the lock > acquisition phase faster, but that's not a big part of the runtime > except in very limited scenarios (--schema-only, perhaps). That is not a borderline scenario, as others have also pointed out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote: > On Thu, May 31, 2012 at 10:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Thu, May 31, 2012 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> I'm not; Jeff Janes is. But you shouldn't be holding your breath > >>> anyway, since it's 9.3 material at this point. > > > >> I agree we can't back-patch that change, but then I think we ought to > >> consider back-patching some variant of Tatsuo's patch. Maybe it's not > >> reasonable to thunk an arbitrary number of relation names in there on > >> one line, but how about 1000 relations per LOCK statement or so? I > >> guess we'd need to see how much that erodes the benefit, but we've > >> certainly done back-branch rearrangements in pg_dump in the past to > >> fix various kinds of issues, and this is pretty non-invasive. > > > > I am not convinced either that this patch will still be useful after > > Jeff's fix goes in, ... > > But people on older branches are not going to GET Jeff's fix. FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2 would benefit because pg_upgrade uses the new cluster's pg_dumpall. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Claudio Freire <klaussfreire@gmail.com> writes: > On Thu, May 31, 2012 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The performance patches we applied to pg_dump over the past couple weeks >> were meant to relieve pain in situations where the big server-side >> lossage wasn't the dominant factor in runtime (ie, partial dumps). >> But this one is targeting exactly that area, which is why it looks like >> a band-aid and not a fix to me. > No, Tatsuo's patch attacks a phase dominated by latency in some > setups. No, it does not. The reason it's a win is that it avoids the O(N^2) behavior in the server. Whether the bandwidth savings is worth worrying about cannot be proven one way or the other as long as that elephant is in the room. regards, tom lane
On Thu, May 31, 2012 at 12:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> No, Tatsuo's patch attacks a phase dominated by latency in some >> setups. > > No, it does not. The reason it's a win is that it avoids the O(N^2) > behavior in the server. Whether the bandwidth savings is worth worrying > about cannot be proven one way or the other as long as that elephant > is in the room. > > regards, tom lane I understand that, but if the locking is fixed and made to be O(N) (and hence each table locking O(1)), then latency suddenly becomes the dominating factor. I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump backups, so in that case latency would be negligible and Tatsuo's patch inconsequential. I'm also thinking, whether the ResourceOwner patch you've proposed would get negated by Tatsuo's patch, because suddenly a "portal" (IIRC) has a lot more locks than ResourceOwner could accomodate, forcing a reversal to O(N²) behavior. In that case, that patch would in fact be detrimental... huh... way to go 180
On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock >> management in the server. What I fixed so far on the pg_dump side >> should be enough to let partial dumps run at reasonable speed even if >> the whole database contains many tables. But if psql is taking >> AccessShareLock on lots of tables, there's still a problem. > > Ok, I modified the part of pg_dump where tremendous number of LOCK > TABLE are issued. I replace them with single LOCK TABLE with multiple > tables. With 100k tables LOCK statements took 13 minutes in total, now > it only takes 3 seconds. Comments? Could you rebase this? I tried doing it myself, but must have messed it up because it got slower rather than faster. Thanks, Jeff
On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock >>> management in the server. What I fixed so far on the pg_dump side >>> should be enough to let partial dumps run at reasonable speed even if >>> the whole database contains many tables. But if psql is taking >>> AccessShareLock on lots of tables, there's still a problem. >> >> Ok, I modified the part of pg_dump where tremendous number of LOCK >> TABLE are issued. I replace them with single LOCK TABLE with multiple >> tables. With 100k tables LOCK statements took 13 minutes in total, now >> it only takes 3 seconds. Comments? > > Could you rebase this? I tried doing it myself, but must have messed > it up because it got slower rather than faster. OK, I found the problem. In fixing a merge conflict, I had it execute the query every time it appended a table, rather than just at the end. With my proposed patch in place, I find that for a full default dump your patch is slightly faster with < 300,000 tables, and slightly slower with > 300,000. The differences are generally <2% in either direction. When it comes to back-patching and partial dumps, I'm not really sure what to test. For the record, there is still a quadratic performance on the server, albeit with a much smaller constant factor than the Reassign one. It is in get_tabstat_entry. I don't know if is worth working on that in isolation--if PG is going to try to accommodate 100s of thousands of table, there probably needs to be a more general way to limit the memory used by all aspects of the rel caches. Cheers, Jeff
> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock >>>> management in the server. What I fixed so far on the pg_dump side >>>> should be enough to let partial dumps run at reasonable speed even if >>>> the whole database contains many tables. But if psql is taking >>>> AccessShareLock on lots of tables, there's still a problem. >>> >>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>> it only takes 3 seconds. Comments? >> >> Could you rebase this? I tried doing it myself, but must have messed >> it up because it got slower rather than faster. > > OK, I found the problem. In fixing a merge conflict, I had it execute > the query every time it appended a table, rather than just at the end. > > With my proposed patch in place, I find that for a full default dump > your patch is slightly faster with < 300,000 tables, and slightly > slower with > 300,000. The differences are generally <2% in either > direction. When it comes to back-patching and partial dumps, I'm not > really sure what to test. > > For the record, there is still a quadratic performance on the server, > albeit with a much smaller constant factor than the Reassign one. It > is in get_tabstat_entry. I don't know if is worth working on that in > isolation--if PG is going to try to accommodate 100s of thousands of > table, there probably needs to be a more general way to limit the > memory used by all aspects of the rel caches. I would like to test your patch and w/without my patch. Could you please give me the patches? Or do you have your own git repository? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock >>>>> management in the server. What I fixed so far on the pg_dump side >>>>> should be enough to let partial dumps run at reasonable speed even if >>>>> the whole database contains many tables. But if psql is taking >>>>> AccessShareLock on lots of tables, there's still a problem. >>>> >>>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>>> it only takes 3 seconds. Comments? >>> >>> Could you rebase this? I tried doing it myself, but must have messed >>> it up because it got slower rather than faster. >> >> OK, I found the problem. In fixing a merge conflict, I had it execute >> the query every time it appended a table, rather than just at the end. >> >> With my proposed patch in place, I find that for a full default dump >> your patch is slightly faster with < 300,000 tables, and slightly >> slower with > 300,000. The differences are generally <2% in either >> direction. When it comes to back-patching and partial dumps, I'm not >> really sure what to test. >> >> For the record, there is still a quadratic performance on the server, >> albeit with a much smaller constant factor than the Reassign one. It >> is in get_tabstat_entry. I don't know if is worth working on that in >> isolation--if PG is going to try to accommodate 100s of thousands of >> table, there probably needs to be a more general way to limit the >> memory used by all aspects of the rel caches. > > I would like to test your patch and w/without my patch. Could you > please give me the patches? Or do you have your own git repository? The main patch is in the commit fest as "Resource Owner reassign Locks for the sake of pg_dump" My re-basing of your patch is attached. Cheers, Jeff
Attachment
> On Tue, Jun 12, 2012 at 1:54 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> On Sun, Jun 10, 2012 at 4:47 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>>> On Wed, May 30, 2012 at 2:06 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>>>>> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock >>>>>> management in the server. What I fixed so far on the pg_dump side >>>>>> should be enough to let partial dumps run at reasonable speed even if >>>>>> the whole database contains many tables. But if psql is taking >>>>>> AccessShareLock on lots of tables, there's still a problem. >>>>> >>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>>>> it only takes 3 seconds. Comments? >>>> >>>> Could you rebase this? I tried doing it myself, but must have messed >>>> it up because it got slower rather than faster. >>> >>> OK, I found the problem. In fixing a merge conflict, I had it execute >>> the query every time it appended a table, rather than just at the end. >>> >>> With my proposed patch in place, I find that for a full default dump >>> your patch is slightly faster with < 300,000 tables, and slightly >>> slower with > 300,000. The differences are generally <2% in either >>> direction. When it comes to back-patching and partial dumps, I'm not >>> really sure what to test. >>> >>> For the record, there is still a quadratic performance on the server, >>> albeit with a much smaller constant factor than the Reassign one. It >>> is in get_tabstat_entry. I don't know if is worth working on that in >>> isolation--if PG is going to try to accommodate 100s of thousands of >>> table, there probably needs to be a more general way to limit the >>> memory used by all aspects of the rel caches. >> >> I would like to test your patch and w/without my patch. Could you >> please give me the patches? Or do you have your own git repository? > > The main patch is in the commit fest as "Resource Owner reassign Locks > for the sake of pg_dump" > > My re-basing of your patch is attached. I tested your patches with current master head. The result was pretty good. Before it took 125 minutes (with 9.2 devel) to dump 100k empty tables and now it takes only less than 4 minutes! $ time pg_dump test >/dev/null real 3m56.412s user 0m12.059s sys 0m3.571s Good job! Now I applied rebased pg_dump patch. real 4m1.779s user 0m11.621s sys 0m3.052s Unfortunately I see no improvement. Probably my patch's value is for dumping against older backend. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Hi guys, I just want to let you know that we have created our own solution to dump and restore our databases. The tool was written in java and the source is now on Github (MIT license): https://github.com/tig100/JdbcPgBackup The main approach was to cache all database objects - schemas, tables, indexes, etc., and instead of having postgres do joins between the pg_catalog tables (which include thousands of schemas in pg_namespace and millions of columns in pg_attribute), we do full table scans and then find which schema or table an object belongs to by looking it up in a hash map in java, based on schema and table oid's. The dump is not transactionally safe, so it should be performed on a replica db only (WAL importing disabled), not on a live db. Some statistics: Dump 11,000 schemas = 3 hours. Dump 28,000 schemas = 8 hours. You can read more about the tool on the github page. Best regards, Hugo -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5718532.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: > >> Yeah, Jeff's experiments indicated that the remaining bottleneck is lock > >> management in the server. What I fixed so far on the pg_dump side > >> should be enough to let partial dumps run at reasonable speed even if > >> the whole database contains many tables. But if psql is taking > >> AccessShareLock on lots of tables, there's still a problem. > > > > Ok, I modified the part of pg_dump where tremendous number of LOCK > > TABLE are issued. I replace them with single LOCK TABLE with multiple > > tables. With 100k tables LOCK statements took 13 minutes in total, now > > it only takes 3 seconds. Comments? > > Shall I commit to master and all supported branches? Was this applied? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: >>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>> it only takes 3 seconds. Comments? >> Shall I commit to master and all supported branches? > Was this applied? No, we fixed the server side instead. regards, tom lane
On Thu, Aug 30, 2012 at 04:51:56PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: > >>> Ok, I modified the part of pg_dump where tremendous number of LOCK > >>> TABLE are issued. I replace them with single LOCK TABLE with multiple > >>> tables. With 100k tables LOCK statements took 13 minutes in total, now > >>> it only takes 3 seconds. Comments? > > >> Shall I commit to master and all supported branches? > > > Was this applied? > > No, we fixed the server side instead. Again, thanks. I knew we fixed the server, but wasn't clear that made the client changes unnecessary, but I think I now do remember discussion about that. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: >>>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>>> it only takes 3 seconds. Comments? > >>> Shall I commit to master and all supported branches? > >> Was this applied? > > No, we fixed the server side instead. But only for 9.2, right? So people running back branches are still screwed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: >>>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>>> it only takes 3 seconds. Comments? >>> Was this applied? >> No, we fixed the server side instead. > But only for 9.2, right? So people running back branches are still screwed. Yeah, but they're screwed anyway, because there are a bunch of O(N^2) behaviors involved here, not all of which are masked by what Tatsuo-san suggested. Six months or a year from now, we might have enough confidence in that batch of 9.2 fixes to back-port them en masse. Don't want to do it today though. regards, tom lane
On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Bruce Momjian <bruce@momjian.us> writes: >>>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: >>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>>>> it only takes 3 seconds. Comments? > >>>> Was this applied? > >>> No, we fixed the server side instead. > >> But only for 9.2, right? So people running back branches are still screwed. > > Yeah, but they're screwed anyway, because there are a bunch of O(N^2) > behaviors involved here, not all of which are masked by what Tatsuo-san > suggested. All of the other ones that I know of were associated with pg_dump itself, and since it is recommended to run the newer version of pg_dump against the older version of the server, no back patching would be necessary to get the benefits of those particular fixes. > Six months or a year from now, we might have enough confidence in that > batch of 9.2 fixes to back-port them en masse. Don't want to do it > today though. What would be the recommendation for people trying to upgrade, but who can't get their data out in a reasonable window? Putting Tatsuo-san's change into a future pg_dump might be more conservative than back-porting the server's Lock Table change to the server version they are trying to get rid of. Cheers, Jeff
On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Bruce Momjian <bruce@momjian.us> writes: >>>>> On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: >>>>>> Ok, I modified the part of pg_dump where tremendous number of LOCK >>>>>> TABLE are issued. I replace them with single LOCK TABLE with multiple >>>>>> tables. With 100k tables LOCK statements took 13 minutes in total, now >>>>>> it only takes 3 seconds. Comments? >> >>>>> Was this applied? >> >>>> No, we fixed the server side instead. >> >>> But only for 9.2, right? So people running back branches are still screwed. >> >> Yeah, but they're screwed anyway, because there are a bunch of O(N^2) >> behaviors involved here, not all of which are masked by what Tatsuo-san >> suggested. > > All of the other ones that I know of were associated with pg_dump > itself, and since it is recommended to run the newer version of > pg_dump against the older version of the server, no back patching > would be necessary to get the benefits of those particular fixes. > >> Six months or a year from now, we might have enough confidence in that >> batch of 9.2 fixes to back-port them en masse. Don't want to do it >> today though. > > > What would be the recommendation for people trying to upgrade, but who > can't get their data out in a reasonable window? > > Putting Tatsuo-san's change into a future pg_dump might be more > conservative than back-porting the server's Lock Table change to the > server version they are trying to get rid of. What he said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I've read all the posts in thread, and as I understood in version 9.2 some patches were applied to improve pg_dump speed. I've just installed PostgreSQL 9.2.1 and I still have the same problem. I have a database with 2600 schemas in it. I try to dump each schema individually, but it takes too much time for every schema (about 30-40 seconds per schema, no matter what the data size is). Also for each schema dump I have a slow query log entry, here is an example: >2012-11-06 13:15:32 GMTLOG: duration: 12029.334 ms statement: SELECT c.tableoid, c.oid, c.relname, c.relacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, c.relhasoids, c.relfrozenxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, c.relpersistence, CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_to_string(c.reloptions, ', ') AS reloptions, array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = 'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype = 'a') LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) WHERE c.relkind in ('r', 'S', 'v', 'c', 'f') ORDER BY c.oid -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730864.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Denis <socsam@gmail.com> writes: > I've read all the posts in thread, and as I understood in version 9.2 some > patches were applied to improve pg_dump speed. I've just installed > PostgreSQL 9.2.1 and I still have the same problem. I have a database with > 2600 schemas in it. I try to dump each schema individually, but it takes too > much time for every schema (about 30-40 seconds per schema, no matter what > the data size is). Could you provide a test case for that? Maybe the output of pg_dump -s, anonymized as you see fit? > Also for each schema dump I have a slow query log entry, Could you provide EXPLAIN ANALYZE output for that query? regards, tom lane
Tom Lane-2 wrote > Denis < > socsam@ > > writes: >> I've read all the posts in thread, and as I understood in version 9.2 >> some >> patches were applied to improve pg_dump speed. I've just installed >> PostgreSQL 9.2.1 and I still have the same problem. I have a database >> with >> 2600 schemas in it. I try to dump each schema individually, but it takes >> too >> much time for every schema (about 30-40 seconds per schema, no matter >> what >> the data size is). > > Could you provide a test case for that? Maybe the output of pg_dump -s, > anonymized as you see fit? > >> Also for each schema dump I have a slow query log entry, > > Could you provide EXPLAIN ANALYZE output for that query? > > regards, tom lane > > > -- > Sent via pgsql-performance mailing list ( > pgsql-performance@ > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it takes from 10 to 15 seconds when I am doing backup. Sort (cost=853562.04..854020.73 rows=183478 width=219) (actual time=5340.477..5405.604 rows=183924 loops=1) Sort Key: c.oid Sort Method: external merge Disk: 33048kB -> Hash Left Join (cost=59259.80..798636.25 rows=183478 width=219) (actual time=839.297..4971.299 rows=183924 loops=1) Hash Cond: (c.reltoastrelid = tc.oid) -> Hash Right Join (cost=29530.77..146976.65 rows=183478 width=183) (actual time=404.959..3261.462 rows=183924 loops=1 ) Hash Cond: ((d.classid = c.tableoid) AND (d.objid = c.oid) AND (d.refclassid = c.tableoid)) Join Filter: (c.relkind = 'S'::"char") -> Seq Scan on pg_depend d (cost=0.00..71403.54 rows=995806 width=20) (actual time=1.137..878.571 rows=998642 lo ops=1) Filter: ((objsubid = 0) AND (deptype = 'a'::"char")) Rows Removed by Filter: 2196665 -> Hash (cost=21839.91..21839.91 rows=183478 width=175) (actual time=402.947..402.947 rows=183924 loops=1) Buckets: 1024 Batches: 32 Memory Usage: 876kB -> Seq Scan on pg_class c (cost=0.00..21839.91 rows=183478 width=175) (actual time=0.017..267.614 rows=183 924 loops=1) Filter: (relkind = ANY ('{r,S,v,c,f}'::"char"[])) Rows Removed by Filter: 383565 -> Hash (cost=18333.79..18333.79 rows=560979 width=40) (actual time=434.258..434.258 rows=567489 loops=1) Buckets: 4096 Batches: 32 Memory Usage: 703kB -> Seq Scan on pg_class tc (cost=0.00..18333.79 rows=560979 width=40) (actual time=0.003..273.418 rows=567489 lo ops=1) SubPlan 1 -> Seq Scan on pg_authid (cost=0.00..1.01 rows=1 width=68) (actual time=0.001..0.001 rows=1 loops=183924) Filter: (oid = c.relowner) Rows Removed by Filter: 2 SubPlan 2 -> Seq Scan on pg_tablespace t (cost=0.00..1.02 rows=1 width=64) (actual time=0.001..0.001 rows=0 loops=183924) Filter: (oid = c.reltablespace) Rows Removed by Filter: 2 SubPlan 3 -> Function Scan on unnest x (cost=0.00..1.25 rows=100 width=32) (actual time=0.001..0.001 rows=0 loops=183924) Total runtime: 5428.498 ms Here is the output of "pg_dump -s" test.dump <http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump> -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730877.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Denis <socsam@gmail.com> writes: > Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it > takes from 10 to 15 seconds when I am doing backup. > Sort (cost=853562.04..854020.73 rows=183478 width=219) (actual > time=5340.477..5405.604 rows=183924 loops=1) Hmmm ... so the problem here isn't that you've got 2600 schemas, it's that you've got 183924 tables. That's going to take some time no matter what. It does seem like we could make some small changes to optimize that query a little bit, but they're not going to result in any amazing improvement overall, because pg_dump still has to deal with all the tables it's getting back. Fundamentally, I would ask whether you really need so many tables. It seems pretty likely that you have lots and lots of basically-identical tables. Usually it would be better to redesign such a structure into fewer tables with more index columns. > Here is the output of "pg_dump -s" test.dump > <http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump> This dump contains only 1 schema and 43 tables, so I don't think it's for the database you're having trouble with ... regards, tom lane
Tom Lane-2 wrote > Denis < > socsam@ > > writes: >> Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it >> takes from 10 to 15 seconds when I am doing backup. > >> Sort (cost=853562.04..854020.73 rows=183478 width=219) (actual >> time=5340.477..5405.604 rows=183924 loops=1) > > Hmmm ... so the problem here isn't that you've got 2600 schemas, it's > that you've got 183924 tables. That's going to take some time no matter > what. > > It does seem like we could make some small changes to optimize that > query a little bit, but they're not going to result in any amazing > improvement overall, because pg_dump still has to deal with all the > tables it's getting back. Fundamentally, I would ask whether you really > need so many tables. It seems pretty likely that you have lots and lots > of basically-identical tables. Usually it would be better to redesign > such a structure into fewer tables with more index columns. > >> Here is the output of "pg_dump -s" test.dump >> <http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump> > > This dump contains only 1 schema and 43 tables, so I don't think it's > for the database you're having trouble with ... > > regards, tom lane > > > -- > Sent via pgsql-performance mailing list ( > pgsql-performance@ > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance I wonder why pg_dump has to have deal with all these 183924 tables, if I specified to dump only one scheme: "pg_dump -n schema_name" or even like this to dump just one table "pg_dump -t 'schema_name.comments' " ? We have a web application where we create a schema with a number of tables in it for each customer. This architecture was chosen to ease the process of backup/restoring data. Sometimes clients ask us to restore data for the last month or roll back to last week's state. This task is easy to accomplish then the client's data is isolated in a schema/DB. If we put all the clients data in one table - operations of this kind will be much harder to perform. We will have to restore a huge DB with an enormously large tables in it to find the requested data. Different clients have different activity rate and we can select different backup strategies according to it. This would be impossible in case we keep all the clients data in one table. Besides all the above mentioned, the probability of massive data corruption (if an error in our web application occurs) is much higher. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730998.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
Denis <socsam@gmail.com> writes: > Tom Lane-2 wrote >> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's >> that you've got 183924 tables. That's going to take some time no matter >> what. > I wonder why pg_dump has to have deal with all these 183924 tables, if I > specified to dump only one scheme: "pg_dump -n schema_name" or even like > this to dump just one table "pg_dump -t 'schema_name.comments' " ? It has to know about all the tables even if it's not going to dump them all, for purposes such as dependency analysis. > We have a web application where we create a schema with a number of tables > in it for each customer. This architecture was chosen to ease the process of > backup/restoring data. I find that argument fairly dubious, but in any case you should not imagine that hundreds of thousands of tables are going to be cost-free. regards, tom lane
Tom Lane-2 wrote > Denis < > socsam@ > > writes: >> Tom Lane-2 wrote >>> Hmmm ... so the problem here isn't that you've got 2600 schemas, it's >>> that you've got 183924 tables. That's going to take some time no matter >>> what. > >> I wonder why pg_dump has to have deal with all these 183924 tables, if I >> specified to dump only one scheme: "pg_dump -n schema_name" or even like >> this to dump just one table "pg_dump -t 'schema_name.comments' " ? > > It has to know about all the tables even if it's not going to dump them > all, for purposes such as dependency analysis. > >> We have a web application where we create a schema with a number of >> tables >> in it for each customer. This architecture was chosen to ease the process >> of >> backup/restoring data. > > I find that argument fairly dubious, but in any case you should not > imagine that hundreds of thousands of tables are going to be cost-free. > > regards, tom lane > > > -- > Sent via pgsql-performance mailing list ( > pgsql-performance@ > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance Still I can't undesrtand why pg_dump has to know about all the tables? For example I have such an easy table CREATE TABLE "CLog" ( "fromUser" integer, "toUser" integer, message character varying(2048) NOT NULL, "dateSend" timestamp without time zone NOT NULL ); no foreign keys, it doesn't use partitioning, it doesn't have any relations to any other table. Why pg_dump has to gother information about ALL the tables in the database just to dump one this table? -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731188.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Thu, Nov 8, 2012 at 1:04 AM, Denis <socsam@gmail.com> wrote: > > Still I can't undesrtand why pg_dump has to know about all the tables? Strictly speaking it probably doesn't need to. But it is primarily designed for dumping entire databases, and the efficient way to do that is to read it all into memory in a few queries and then sort out the dependencies, rather than tracking down every dependency individually with one or more trips back to the database. (Although it still does make plenty of trips back to the database per table/sequence, for acls, defaults, attributes. If you were to rewrite pg_dump from the ground up to achieve your specific needs (dumping one schema, with no dependencies between to other schemata) you could probably make it much more efficient. But then it wouldn't be pg_dump, it would be something else. Cheers, Jeff
Jeff Janes wrote > On Thu, Nov 8, 2012 at 1:04 AM, Denis < > socsam@ > > wrote: >> >> Still I can't undesrtand why pg_dump has to know about all the tables? > > Strictly speaking it probably doesn't need to. But it is primarily > designed for dumping entire databases, and the efficient way to do > that is to read it all into memory in a few queries and then sort out > the dependencies, rather than tracking down every dependency > individually with one or more trips back to the database. (Although > it still does make plenty of trips back to the database per > table/sequence, for acls, defaults, attributes. > > If you were to rewrite pg_dump from the ground up to achieve your > specific needs (dumping one schema, with no dependencies between to > other schemata) you could probably make it much more efficient. But > then it wouldn't be pg_dump, it would be something else. > > Cheers, > > Jeff > > > -- > Sent via pgsql-performance mailing list ( > pgsql-performance@ > ) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance Please don't think that I'm trying to nitpick here, but pg_dump has options for dumping separate tables and that's not really consistent with the idea that "pg_dump is primarily designed for dumping entire databases". -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731900.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On Tue, Nov 13, 2012 at 7:12 PM, Denis <socsam@gmail.com> wrote: > Jeff Janes wrote >> On Thu, Nov 8, 2012 at 1:04 AM, Denis < > >> socsam@ > >> > wrote: >>> >>> Still I can't undesrtand why pg_dump has to know about all the tables? >> >> Strictly speaking it probably doesn't need to. But it is primarily >> designed for dumping entire databases, and the efficient way to do >> that is to read it all into memory in a few queries and then sort out >> the dependencies, rather than tracking down every dependency >> individually with one or more trips back to the database. (Although >> it still does make plenty of trips back to the database per >> table/sequence, for acls, defaults, attributes. >> >> If you were to rewrite pg_dump from the ground up to achieve your >> specific needs (dumping one schema, with no dependencies between to >> other schemata) you could probably make it much more efficient. But >> then it wouldn't be pg_dump, it would be something else. >> > > Please don't think that I'm trying to nitpick here, but pg_dump has options > for dumping separate tables and that's not really consistent with the idea > that "pg_dump is primarily designed for dumping entire databases". I think it is compatible. From my reading of pg_dump, those other options seem to have been bolted on as an afterthought, not as part of its primary design. Cheers, Jeff
On 11/13/2012 10:12 PM, Denis wrote: > Please don't think that I'm trying to nitpick here, but pg_dump has options > for dumping separate tables and that's not really consistent with the idea > that "pg_dump is primarily designed for dumping entire databases". > > Sure it is. The word "primarily" is not just a noise word here. The fact that we have options to do other things doesn't mean that its primary design goal has changed. cheers andrew