Thread: pg_dump on older version of postgres eating huge amounts of memory
I'm in a bit of a bad spot. We have a number of remotely deployed postgres servers. We are very happy with the depenability of postgres, but have recently run into a problem with pg_dump in SOME of the servers. When doing a pg_dump databasename > file.dmp pg_dump eats up all the memory, and is eventually killed because it takes up a huge amount of memory. The dump file is always 0 bytes. It works perfectly fine on our other server. When we run pg_dump in verbose mode, it is trying to get the trigger information. Does anybody have any clue what could be happening? Or even how I can help determine the problem? The machines are only accessable to us by modem dialup. Postgres Version 7.1.2 ( Yes, I know it's ancient - but again - these are a number of servers remotely deployed... ) Redhat 7.1 ( see above ). I know this is an old version of postgres, but I would really appreciate any help I can get trying to track this down. Steve
Re: pg_dump on older version of postgres eating huge amounts of memory
From
Martijn van Oosterhout
Date:
On Thu, Mar 18, 2004 at 07:37:54PM -0500, Steve Krall wrote: > > I'm in a bit of a bad spot. We have a number of remotely deployed > postgres servers. We are very happy with the depenability of postgres, > but have recently run into a problem with pg_dump in SOME of the servers. > > When doing a pg_dump databasename > file.dmp pg_dump eats up all the > memory, and is eventually killed because it takes > up a huge amount of memory. The dump file is always 0 bytes. It works > perfectly fine on our other server. The only reason I can remember pg_dump running out of memory was when using the -d or -D options on large tables. This used to be an issue but has long since been fixed. If this is the problem, you can use a 7.4 pg_dump against a 7.1 database, the output migh not be reloadable into a 7.1 DB though. Or don't use -d or -D. Or this is not the problem at all... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > If the Catholic church can survive the printing press, science fiction > will certainly weather the advent of bookwarez. > http://craphound.com/ebooksneitherenorbooks.txt - Cory Doctorow
Attachment
Steve Krall <swalker@iglou.com> writes: > When doing a pg_dump databasename > file.dmp pg_dump eats up all the > memory, and is eventually killed because it takes > up a huge amount of memory. The dump file is always 0 bytes. It works > perfectly fine on our other server. > When we run pg_dump in verbose mode, it is trying to get the trigger > information. Can you run it with query logging turned on, so we can see what query(s) it's issuing while it eats RAM? regards, tom lane
I've done this with the debug level all the way up... You can get the file here( 20 megs uncompressed, 130K compressed ): http://www.papajohns.com/postgres/postgres.log.bz2 While this dump was running, top reported that pg_dump was taking up around 500-550megs. Then the machine stopped responding. If you need anything else, just ask. Steve On Thu, 18 Mar 2004, Tom Lane wrote: > Steve Krall <swalker@iglou.com> writes: > > When doing a pg_dump databasename > file.dmp pg_dump eats up all the > > memory, and is eventually killed because it takes > > up a huge amount of memory. The dump file is always 0 bytes. It works > > perfectly fine on our other server. > > > When we run pg_dump in verbose mode, it is trying to get the trigger > > information. > > Can you run it with query logging turned on, so we can see what query(s) > it's issuing while it eats RAM? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
We do have quite a few trigger. Some are RI triggers postgres automagically creates. The remaining triggers are used for things like audit tables, or logging who updated a row etc... I did what you suggested. I ran pg_dump dbname > file.dmp. Immediatly aftwards, kill -ABRT pid. I ended up w/ a 300meg core file. Just viewing the file - it has a ton of DROP TRIGGER statements. And a few create trigger statements. [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER" | wc -l 5450219 [root@r-and-d /tmp]# strings core | grep "CREATE TRIGGER" | wc -l 466 We have alot of triggers, but not 5.5 million :) So, I took one table, and decided to look at the number of triggers for that table. [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER .* \"data_owner\"" | sort | uniq | wc -l 107 [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER .* \"data_owner\"" | wc -l 347550 [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER .* \"product\"" | sort | uniq | wc -l 151 [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER .* \"product\"" | wc -l 346696 The back trace of the core looks as you describe below: (gdb) bt #0 0x401d7322 in __libc_send () from /lib/i686/libc.so.6 #1 0x40020675 in pqFlush () at eval.c:41 #2 0x4001e127 in PQsendQuery () at eval.c:41 #3 0x4001eb21 in PQexec () at eval.c:41 #4 0x0804d312 in getTables () at eval.c:41 #5 0x0805283e in dumpSchema () at eval.c:41 #6 0x0804adb2 in main () at eval.c:41 #7 0x4010c177 in __libc_start_main (main=0x804a5f4 <main>, argc=2, ubp_av=0xbfffe204, init=0x804954c <_init>, fini=0x8059c50 <_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbfffe1fc) at ../sysdeps/generic/libc-start.c:129 (gdb) Again, this happens only on some of the servers... But, we downloaded the entire db directory from one of those servers, and ran postgres against that directory - and we see the same results... So, it would seems that something in the data causes pg_dump to freak out... Is there a query that I can run against the system catalog that would help ? Steve On Fri, 19 Mar 2004, Tom Lane wrote: > Steve Krall <swalker@iglou.com> writes: > > You can get the file here( 20 megs uncompressed, 130K compressed ): > > http://www.papajohns.com/postgres/postgres.log.bz2 > > While this dump was running, top reported that pg_dump was taking up > > around 500-550megs. Then the machine stopped responding. > > Hmm. The trace looks completely unexceptional --- it's just running > through your tables collecting index and trigger info (the loop in > getTables() in pg_dump.c). You do seem to have rather a lot of > triggers, but not 500 megs worth. > > Digging in the 7.1 source code, I notice that there is a small leak in > this loop: the query results from the two index-related queries are > never freed. There should be a "PQclear(res2);" at line 2313 and > another at line 2386. (Each at the end of the scope of the "res2" local > variables; the line numbers might be a bit different in 7.1.2 than in > the 7.1.3 code I'm looking at.) However the trace shows that these > queries are executed a couple hundred times apiece, and the waste from > the unfreed query results shouldn't exceed a couple K each, so this > doesn't explain hundreds of megs of bloat either. Still you might try > fixing it and see if it makes a difference. > > The next move I can think of is to "kill -ABRT" the pg_dump run after > it's gotten to some moderate size (50Meg at most) and then manually poke > through the resulting core file to get a sense of what it's filling > memory with. > > regards, tom lane >
Steve Krall <swalker@iglou.com> writes: > You can get the file here( 20 megs uncompressed, 130K compressed ): > http://www.papajohns.com/postgres/postgres.log.bz2 > While this dump was running, top reported that pg_dump was taking up > around 500-550megs. Then the machine stopped responding. Hmm. The trace looks completely unexceptional --- it's just running through your tables collecting index and trigger info (the loop in getTables() in pg_dump.c). You do seem to have rather a lot of triggers, but not 500 megs worth. Digging in the 7.1 source code, I notice that there is a small leak in this loop: the query results from the two index-related queries are never freed. There should be a "PQclear(res2);" at line 2313 and another at line 2386. (Each at the end of the scope of the "res2" local variables; the line numbers might be a bit different in 7.1.2 than in the 7.1.3 code I'm looking at.) However the trace shows that these queries are executed a couple hundred times apiece, and the waste from the unfreed query results shouldn't exceed a couple K each, so this doesn't explain hundreds of megs of bloat either. Still you might try fixing it and see if it makes a difference. The next move I can think of is to "kill -ABRT" the pg_dump run after it's gotten to some moderate size (50Meg at most) and then manually poke through the resulting core file to get a sense of what it's filling memory with. regards, tom lane
Steve Krall <swalker@iglou.com> writes: > [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER" | wc -l > 5450219 > We have alot of triggers, but not 5.5 million :) Looks like a smoking gun to me ... Armed with that, I went back to the 7.1 source code, and what I see is a huge (like O(N^2) in the number of triggers) leak in this routine. What it needs is a "resetPQExpBuffer(delqry);" call somewhere in the loop over triggers. I'd suggest putting it right before the line appendPQExpBuffer(delqry, "DROP TRIGGER %s ", fmtId(tgname, force_quotes)); I see the same leak in 7.2, btw, but it's gone in 7.3. Didn't look at prior releases ... (It seems the reason no one noticed is that the constructed string isn't actually *used* anyplace. Sigh.) regards, tom lane
I added the line, at the place you suggested. pg_dump now works as expected. Thank you for taking the time to debug this! Steve On Fri, 19 Mar 2004, Tom Lane wrote: > Steve Krall <swalker@iglou.com> writes: > > [root@r-and-d /tmp]# strings core | grep "DROP TRIGGER" | wc -l > > 5450219 > > > We have alot of triggers, but not 5.5 million :) > > Looks like a smoking gun to me ... > > Armed with that, I went back to the 7.1 source code, and what I see is a > huge (like O(N^2) in the number of triggers) leak in this routine. What > it needs is a "resetPQExpBuffer(delqry);" call somewhere in the loop > over triggers. I'd suggest putting it right before the line > > appendPQExpBuffer(delqry, "DROP TRIGGER %s ", fmtId(tgname, force_quotes)); > > I see the same leak in 7.2, btw, but it's gone in 7.3. Didn't look at > prior releases ... > > (It seems the reason no one noticed is that the constructed string isn't > actually *used* anyplace. Sigh.) > > regards, tom lane >