Thread: pg_restore stuck in a loop?
I took a pg_dump -Fc from a Pg 8.0.1 installation on (Solaris / Sun) with Slony installed and attempted to restore to 8.0.2 on Linux / x86 without Slony installed. pg_restore skipped all of the Slony related items, but then it immediately jumped up to 700MB of memory usage and 99% CPU, but the DB connection was idle. Two hours later, memory usage had gradually dropped to 500MB, but CPU was still going full still with an idle database connection. I eventually clued in and made a TOC and removed all of the Slony items, but I'm still curious to know what exactly pg_restore had been doing for the last hour or so. --
Rod Taylor <pg@rbt.ca> writes: > I eventually clued in and made a TOC and removed all of the Slony items, > but I'm still curious to know what exactly pg_restore had been doing for > the last hour or so. You tell us ;-). You've got the test case, attach to it with a debugger and find out what it's doing. regards, tom lane
On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote: > Rod Taylor <pg@rbt.ca> writes: > > I eventually clued in and made a TOC and removed all of the Slony items, > > but I'm still curious to know what exactly pg_restore had been doing for > > the last hour or so. > > You tell us ;-). You've got the test case, attach to it with a debugger > and find out what it's doing. I wasn't entirely sure how to "catch it in action" so I just used CTRL+C to interrupt the pg_restore process, and got these as backtraces: (gdb) backtrace #0 0xb74b9589 in mremap () from /lib/tls/libc.so.6 #1 0xb744dc13 in mremap_chunk () from /lib/tls/libc.so.6 #2 0xb744dfd1 in realloc () from /lib/tls/libc.so.6 #3 0xb75e46b9 in enlargePQExpBuffer () from /home/rbt/pg8/lib/libpq.so.4 #4 0xb75e481f in appendPQExpBufferChar () from /home/rbt/pg8/lib/libpq.so.4 #5 0x0804efe3 in _sendSQLLine () #6 0x0804f2bd in ExecuteSqlCommandBuf () #7 0x0805019e in _PrintData () #8 0x0804fee5 in _PrintTocData () #9 0x0804aba2 in RestoreArchive () #10 0x0804a525 in main () (gdb) backtrace #0 0xb75e465e in enlargePQExpBuffer () from /home/rbt/pg8/lib/libpq.so.4 #1 0xb75e481f in appendPQExpBufferChar () from /home/rbt/pg8/lib/libpq.so.4 #2 0x0804efe3 in _sendSQLLine () #3 0x0804f2bd in ExecuteSqlCommandBuf () #4 0x0805019e in _PrintData () #5 0x0804fee5 in _PrintTocData () #6 0x0804aba2 in RestoreArchive () #7 0x0804a525 in main () (gdb) backtrace #0 0xb75e4800 in appendPQExpBufferChar () from /home/rbt/pg8/lib/libpq.so.4 #1 0x0804efe3 in _sendSQLLine () #2 0x0804f2bd in ExecuteSqlCommandBuf () #3 0x0805019e in _PrintData () #4 0x0804fee5 in _PrintTocData () #5 0x0804aba2 in RestoreArchive () #6 0x0804a525 in main () --
Rod Taylor <pg@rbt.ca> writes: > On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote: >> You tell us ;-). You've got the test case, attach to it with a debugger >> and find out what it's doing. > I wasn't entirely sure how to "catch it in action" so I just used CTRL+C > to interrupt the pg_restore process, and got these as backtraces: Hm, I wonder if there is something broken about the "SQL parsing" logic in _sendSQLLine, such that it could go into an infinite loop given the right input? regards, tom lane
On Thu, 2005-04-28 at 02:11 -0400, Tom Lane wrote: > Rod Taylor <pg@rbt.ca> writes: > > On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote: > >> You tell us ;-). You've got the test case, attach to it with a debugger > >> and find out what it's doing. > > > I wasn't entirely sure how to "catch it in action" so I just used CTRL+C > > to interrupt the pg_restore process, and got these as backtraces: > > Hm, I wonder if there is something broken about the "SQL parsing" logic > in _sendSQLLine, such that it could go into an infinite loop given the > right input? Let me see if I can put together a test case which is smaller for next week, and I'll send that in. --
Rod Taylor <pg@rbt.ca> writes: > On Thu, 2005-04-28 at 02:11 -0400, Tom Lane wrote: >> Hm, I wonder if there is something broken about the "SQL parsing" logic >> in _sendSQLLine, such that it could go into an infinite loop given the >> right input? > Let me see if I can put together a test case which is smaller for next > week, and I'll send that in. If the parsing-bug theory is right, then there is probably some one object in your database that will make even "pg_dump -s" go nuts. regards, tom lane
Rod Taylor <pg@rbt.ca> writes: > On Tue, 2005-04-26 at 23:22 -0400, Tom Lane wrote: >> Rod Taylor <pg@rbt.ca> writes: >>> I eventually clued in and made a TOC and removed all of the Slony items, >>> but I'm still curious to know what exactly pg_restore had been doing for >>> the last hour or so. >> >> You tell us ;-). You've got the test case, attach to it with a debugger >> and find out what it's doing. > I wasn't entirely sure how to "catch it in action" so I just used CTRL+C > to interrupt the pg_restore process, and got these as backtraces: I had forgotten about this problem, but noticed it again in my mail folder. After staring at the code for awhile, I have a theory: whoever added dollar-quote parsing to _sendSQLLine() did not understand that it has to keep all its state in AH->sqlparse, because the input it's passed can be broken into arbitrary bufferload boundaries. Suppose that the input contains $1$2 (which should properly be parsed as two parameter symbols) and a bufferload boundary happens to fall between the first $ and the 1. Because startDT is incorrectly treated as a reinitializable local, the code will mistakenly decide that $1$ is a dollar-quote start tag, and start looking for the matching close tag. As long as it doesn't happen to find a match, all the rest of the output intended to go to the database (including both SQL commands and COPY data) is going to get crammed into AH->sqlBuf. This wouldn't be an infinite loop exactly, but it could certainly drive the program into swap hell before long, if the dump file contains sufficiently many megabytes of data. To believe that this explains your report, we'd have to assume that the Slony part of the dump included such a string and it wasn't within any other form of quoting. I'm not sure offhand whether pg_dump would ever actually produce such a string --- in most contexts I'd expect "$n" to be surrounded by spaces. Whether or not this actually is what you saw, the code is definitely broken. Will work on cleaning it up. regards, tom lane