Thread: More 8.2 client issues (Was: [Slow dump?)
Hmm... This gets stranger and stranger. When connecting to the database with the psql client in 8.2's bin directory and using commands such as \d the client hangs, or takes an extremely long time. If we connect to the same 8.2 database with a psql client from 8.1.4, both remotely and locally, \d responds immediately. Could the issue be with the client programs somehow? Note also that we did our migration over the xmas weekend using the dump straight into a restore command. We kicked it off Saturday (12-23-06) night and it had just reached the point of adding foreign keys the morning of the 26th. We stopped it there, wrote a script to go through and build indexes (which finished in a timely manner) and have added just the foreign keys strictly necessary for our applications functionality (i.e. foreign keys set to cascade on update/delete, etc...). -------- Original Message -------- Subject: Re: [PERFORM] Slow dump? Date: Tue, 02 Jan 2007 11:40:18 -0600 From: Erik Jones <erik@myemma.com> To: Tom Lane <tgl@sss.pgh.pa.us> CC: pgsql-performance@postgresql.org References: <459A8B51.8000906@myemma.com> <5876.1167759024@sss.pgh.pa.us> Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > >> Hello, we recently migrated our system from 8.1.x to 8.2 and when >> running dumps have noticed an extreme decrease in speed where the dump >> is concerned (by more than a factor of 2). >> > > That's odd. pg_dump is normally pretty much I/O bound, at least > assuming your tables are sizable. The only way it wouldn't be is if you > have a datatype with a very slow output converter. Have you looked into > exactly which tables are slow to dump and what datatypes they contain? > (Running pg_dump with log_min_duration_statement enabled would provide > useful data about which steps take a long time, if you're not sure.) > > regards, tom lane > Well, all of our tables use pretty basic data types: integer (various sizes), text, varchar, boolean, and timestamps without time zone. In addition, other than not having a lot of our foreign keys in place, there have been no other schema changes since the migration. -- erik jones <erik@myemma.com> software development emma(r) ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -- erik jones <erik@myemma.com> software development emma(r)
Erik Jones wrote: > Hmm... This gets stranger and stranger. When connecting to the > database with the psql client in 8.2's bin directory and using commands > such as \d the client hangs, or takes an extremely long time. If we > connect to the same 8.2 database with a psql client from 8.1.4, both > remotely and locally, \d responds immediately. Could the issue be with > the client programs somehow? Couldn't be some DNS problems that only affect the 8.2 client I suppose? -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Erik Jones wrote: >> Hmm... This gets stranger and stranger. When connecting to the >> database with the psql client in 8.2's bin directory and using >> commands such as \d the client hangs, or takes an extremely long >> time. If we connect to the same 8.2 database with a psql client from >> 8.1.4, both remotely and locally, \d responds immediately. Could the >> issue be with the client programs somehow? > > Couldn't be some DNS problems that only affect the 8.2 client I suppose? > Hmm... I don't see how that would matter when the 8.2. client is being run locally. -- erik jones <erik@myemma.com> software development emma(r)
Erik Jones <erik@myemma.com> writes: > Hmm... This gets stranger and stranger. When connecting to the > database with the psql client in 8.2's bin directory and using commands > such as \d the client hangs, or takes an extremely long time. Hangs at what point? During connection? Try strace'ing psql (or whatever the Solaris equivalent is) to see what it's doing. regards, tom lane
Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > >> Hmm... This gets stranger and stranger. When connecting to the >> database with the psql client in 8.2's bin directory and using commands >> such as \d the client hangs, or takes an extremely long time. >> > > Hangs at what point? During connection? Try strace'ing psql (or > whatever the Solaris equivalent is) to see what it's doing. > Ok, here's the truss output when attached to psql with "\d pg_class", I put a marker where the pause is. Note that today the pause is only (sic) about 3-4 seconds long before the command completes and the output is displayed and that the only difference in the system between yesterday and today is that today we don't have a dump running. I realize that most of this output below is unnecessary, but while I know what most of this is doing individually, I wouldn't know what to cut out for brevity's sake without accidentally also clipping something that is needed. read(0, 0x08047B7B, 1) (sleeping...) read(0, " \", 1) = 1 write(1, " \", 1) = 1 read(0, " d", 1) = 1 write(1, " d", 1) = 1 read(0, " ", 1) = 1 write(1, " ", 1) = 1 read(0, " ", 1) = 1 write(1, " ", 1) = 1 read(0, " p", 1) = 1 write(1, " p", 1) = 1 read(0, "7F", 1) = 1 write(1, "\b \b", 3) = 3 read(0, "7F", 1) = 1 write(1, "\b \b", 3) = 3 read(0, " p", 1) = 1 write(1, " p", 1) = 1 read(0, " g", 1) = 1 write(1, " g", 1) = 1 read(0, " _", 1) = 1 write(1, " _", 1) = 1 read(0, " c", 1) = 1 write(1, " c", 1) = 1 read(0, " l", 1) = 1 write(1, " l", 1) = 1 read(0, " a", 1) = 1 write(1, " a", 1) = 1 read(0, " s", 1) = 1 write(1, " s", 1) = 1 read(0, " s", 1) = 1 write(1, " s", 1) = 1 read(0, "\r", 1) = 1 write(1, "\n", 1) = 1 lwp_sigmask(SIG_SETMASK, 0x00000002, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] ioctl(0, TCSETSW, 0xFEF431E0) = 0 lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] sigaction(SIGINT, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTERM, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGQUIT, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGALRM, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTSTP, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTTOU, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGTTIN, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGWINCH, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGWINCH, 0x08047B80, 0x08047BD0) = 0 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\0\0E5 S E L E C T ".., 230, 0) = 230 <----------------------------------------------------------- Hang is right here! sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) (sleeping...) pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 recv(4, " T\0\0\0 P\003 o i d\0\0".., 16384, 0) = 140 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\0\08F S E L E C T ".., 144, 0) = 144 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 recv(4, " T\0\0\0D3\007 r e l h a".., 16384, 0) = 272 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\00186 S E L E C T ".., 391, 0) = 391 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 recv(4, " T\0\0\08F\005 a t t n a".., 16384, 0) = 1375 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\001 g S E L E C T ".., 360, 0) = 360 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 recv(4, " T\0\0\0DD\007 r e l n a".., 16384, 0) = 526 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 send(4, " Q\0\0\090 S E L E C T ".., 145, 0) = 145 sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 recv(4, " T\0\0\0 \001 r e l n a".., 16384, 0) = 51 ioctl(0, TCGETA, 0x08046F24) = 0 ioctl(1, TCGETA, 0x08046F24) = 0 ioctl(1, TIOCGWINSZ, 0x08046F58) = 0 write(1, " T a b l e ".., 34) = 34 write(1, " C o l u m n ".., 41) = 41 write(1, " - - - - - - - - - - - -".., 41) = 41 write(1, " r e l n a m e ".., 39) = 39 write(1, " r e l n a m e s p a c".., 39) = 39 write(1, " r e l t y p e ".., 39) = 39 write(1, " r e l o w n e r ".., 39) = 39 write(1, " r e l a m ".., 39) = 39 write(1, " r e l f i l e n o d e".., 39) = 39 write(1, " r e l t a b l e s p a".., 39) = 39 write(1, " r e l p a g e s ".., 39) = 39 write(1, " r e l t u p l e s ".., 39) = 39 write(1, " r e l t o a s t r e l".., 39) = 39 write(1, " r e l t o a s t i d x".., 39) = 39 write(1, " r e l h a s i n d e x".., 39) = 39 write(1, " r e l i s s h a r e d".., 39) = 39 write(1, " r e l k i n d ".., 39) = 39 write(1, " r e l n a t t s ".., 39) = 39 write(1, " r e l c h e c k s ".., 39) = 39 write(1, " r e l t r i g g e r s".., 39) = 39 write(1, " r e l u k e y s ".., 39) = 39 write(1, " r e l f k e y s ".., 39) = 39 write(1, " r e l r e f s ".., 39) = 39 write(1, " r e l h a s o i d s ".., 39) = 39 write(1, " r e l h a s p k e y ".., 39) = 39 write(1, " r e l h a s r u l e s".., 39) = 39 write(1, " r e l h a s s u b c l".., 39) = 39 write(1, " r e l f r o z e n x i".., 39) = 39 write(1, " r e l a c l ".., 31) = 31 write(1, " r e l o p t i o n s ".., 31) = 31 write(1, " I n d e x e s :\n", 9) = 9 write(1, " " p g _ c l a s".., 45) = 45 write(1, " " p g _ c l a s".., 71) = 71 write(1, "\n", 1) = 1 lwp_sigmask(SIG_SETMASK, 0x00000002, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] ioctl(0, TIOCGWINSZ, 0x08047B58) = 0 ioctl(0, TIOCSWINSZ, 0x08047B58) = 0 ioctl(0, TCGETS, 0x08047BB0) = 0 ioctl(0, TCSETSW, 0x08047BB0) = 0 lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF] sigaction(SIGINT, 0x08047AA0, 0x08047B10) = 0 sigaction(SIGTERM, 0x08047AA0, 0x08047B10) = 0 sigaction(SIGQUIT, 0x08047AA0, 0x08047B10) = 0 sigaction(SIGALRM, 0x08047B00, 0x08047B70) = 0 sigaction(SIGTSTP, 0x08047AA0, 0x08047B10) = 0 sigaction(SIGTTOU, 0x08047AA0, 0x08047B10) = 0 sigaction(SIGTTIN, 0x08047AA0, 0x08047B10) = 0 sigaction(SIGWINCH, 0x08047AA0, 0x08047B10) = 0 write(1, " e m m a 2 = # ", 8) = 8 read(0, 0x08047B7B, 1) (sleeping...) -- erik jones <erik@myemma.com> software development emma(r)
Erik Jones <erik@myemma.com> writes: > ... > sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 > send(4, " Q\0\0\0E5 S E L E C T ".., 230, 0) = 230 > <----------------------------------------------------------- Hang is > right here! > sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 > pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) (sleeping...) > pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 > recv(4, " T\0\0\0 P\003 o i d\0\0".., 16384, 0) = 140 > ... Hmph. So it seems the delay really is on the server's end. Any chance you could truss the connected backend process too and see what it's doing? Actually ... before you do that, the first query for "\d pg_class" should look like SELECT c.oid, n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname ~ '^(pg_class)$' AND pg_catalog.pg_table_is_visible(c.oid) ORDER BY 2, 3; I could see this taking an unreasonable amount of time if you had a huge number of pg_class rows or a very long search_path --- is your database at all out of the ordinary in those ways? regards, tom lane
Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > >> ... >> sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 >> send(4, " Q\0\0\0E5 S E L E C T ".., 230, 0) = 230 >> <----------------------------------------------------------- Hang is >> right here! >> sigaction(SIGPIPE, 0x08046E20, 0x08046E70) = 0 >> pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) (sleeping...) >> pollsys(0x08046EE8, 1, 0x00000000, 0x00000000) = 1 >> recv(4, " T\0\0\0 P\003 o i d\0\0".., 16384, 0) = 140 >> ... >> > > Hmph. So it seems the delay really is on the server's end. Any chance > you could truss the connected backend process too and see what it's doing? > > Actually ... before you do that, the first query for "\d pg_class" > should look like > > SELECT c.oid, > n.nspname, > c.relname > FROM pg_catalog.pg_class c > LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace > WHERE c.relname ~ '^(pg_class)$' > AND pg_catalog.pg_table_is_visible(c.oid) > ORDER BY 2, 3; > > I could see this taking an unreasonable amount of time if you had a huge > number of pg_class rows or a very long search_path --- is your database > at all out of the ordinary in those ways? > Well, running "select count(*) from pg_class;" returns 524699 rows and our search path is the default. I'd also like to reiterate that \d pg_class returns instantly when run from the 8.1.4 psql client connected to the 8.2 db. How would I go about determining which backend server process psql was attached to? -- erik jones <erik@myemma.com> software development emma(r)
Erik, Could you set log_min_duration_statement=0 on your server and enable logging (tutorial here if you don't know how to do that: http://pgfouine.projects.postgresql.org/tutorial.html). You should see which queries are executed in both cases and find the slow one easily. Regards, -- Guillaume
Erik Jones <erik@myemma.com> writes: > Tom Lane wrote: >> I could see this taking an unreasonable amount of time if you had a huge >> number of pg_class rows or a very long search_path --- is your database >> at all out of the ordinary in those ways? >> > Well, running "select count(*) from pg_class;" returns 524699 rows Ouch. > our search path is the default. I'd also like to reiterate that \d > pg_class returns instantly when run from the 8.1.4 psql client connected > to the 8.2 db. I think I know where the problem is: would you compare timing of select * from pg_class where c.relname ~ '^(pg_class)$'; select * from pg_class where c.relname ~ '^pg_class$'; Recent versions of psql put parentheses into the regex pattern for safety in case it's got "|", but I just realized that that probably confuses the optimizer's check for an indexable regex :-( However, this only explains slowdown in psql's \d commands, which wasn't your original complaint ... regards, tom lane
Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > >> Tom Lane wrote: >> >>> I could see this taking an unreasonable amount of time if you had a huge >>> number of pg_class rows or a very long search_path --- is your database >>> at all out of the ordinary in those ways? >>> >>> >> Well, running "select count(*) from pg_class;" returns 524699 rows >> > > Ouch. > > >> our search path is the default. I'd also like to reiterate that \d >> pg_class returns instantly when run from the 8.1.4 psql client connected >> to the 8.2 db. >> > > I think I know where the problem is: would you compare timing of > > select * from pg_class where c.relname ~ '^(pg_class)$'; > Approximately 4 seconds. > select * from pg_class where c.relname ~ '^pg_class$'; > Instant. > Recent versions of psql put parentheses into the regex pattern for > safety in case it's got "|", but I just realized that that probably > confuses the optimizer's check for an indexable regex :-( > > However, this only explains slowdown in psql's \d commands, which > wasn't your original complaint ... > Well, it explains the slowdown wrt a query against the catalog tables by a postgres client application. Were there any changes made like this to pg_dump and/or pg_restore? -- erik jones <erik@myemma.com> software development emma(r)
Guillaume Smet wrote: > Erik, > > Could you set log_min_duration_statement=0 on your server and enable > logging (tutorial here if you don't know how to do that: > http://pgfouine.projects.postgresql.org/tutorial.html). > > You should see which queries are executed in both cases and find the > slow one easily. Heh, unfortunately, setting log_min_duration_statement=0 would be a total last resort as the last we counted (2 months ago) we were doing approximately 3 million transactions per hour. -- erik jones <erik@myemma.com> software development emma(r)
Erik Jones <erik@myemma.com> writes: > Guillaume Smet wrote: >> Could you set log_min_duration_statement=0 on your server and enable > Heh, unfortunately, setting log_min_duration_statement=0 would be a > total last resort as the last we counted (2 months ago) we were doing > approximately 3 million transactions per hour. Do it just for the pg_dump: export PGOPTIONS="--log_min_duration_statement=0" pg_dump ... I don't think that the regex issue explains pg_dump being slow, unless perhaps you are making use of the table-selection switches? regards, tom lane
Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > >> Guillaume Smet wrote: >> >>> Could you set log_min_duration_statement=0 on your server and enable >>> > > >> Heh, unfortunately, setting log_min_duration_statement=0 would be a >> total last resort as the last we counted (2 months ago) we were doing >> approximately 3 million transactions per hour. >> > > Do it just for the pg_dump: > > export PGOPTIONS="--log_min_duration_statement=0" > pg_dump ... > > I don't think that the regex issue explains pg_dump being slow, > unless perhaps you are making use of the table-selection switches? > That's a good idea, but first I'll still need to run it by my sysadmin wrt space -- our dump files are around 22GB when we can let them finish these days. We do have plans to move off of the dump to a snapshot backup strategy that will eventually lead to a PITR warm-standby setup but, first, we want to make sure we have a stable, fast, up-to-date server -- our web servers are still connecting to the db via 8.1.4 client libs as given what we've seen of the track record for 8.2. client libs on our setup, we're bit reticent to move the rest of the application over. While I wait to see what we can do about logging everything during the dump I'll probably build 8.2 on a remote linux machine and see how connecting via those tools compares. -- erik jones <erik@myemma.com> software development emma(r)
Erik Jones <erik@myemma.com> writes: > That's a good idea, but first I'll still need to run it by my sysadmin > wrt space -- our dump files are around 22GB when we can let them finish > these days. Given that we're now speculating about regex problems, you could do a test run of "pg_dump -s" with logging enabled; that shouldn't take an unreasonable amount of time or space. regards, tom lane
Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > >> That's a good idea, but first I'll still need to run it by my sysadmin >> wrt space -- our dump files are around 22GB when we can let them finish >> these days. >> > > Given that we're now speculating about regex problems, you could do a > test run of "pg_dump -s" with logging enabled; that shouldn't take an > unreasonable amount of time or space. > > regards, tom lane > Sounds like a post-lunch plan! By the way, even though this isn't even solved yet, thank you for all of your help! -- erik jones <erik@myemma.com> software development emma(r)
Erik Jones wrote: > Tom Lane wrote: >> Erik Jones <erik@myemma.com> writes: >> >>> That's a good idea, but first I'll still need to run it by my >>> sysadmin wrt space -- our dump files are around 22GB when we can let >>> them finish these days. >>> >> >> Given that we're now speculating about regex problems, you could do a >> test run of "pg_dump -s" with logging enabled; that shouldn't take an >> unreasonable amount of time or space. >> >> regards, tom lane >> > Sounds like a post-lunch plan! By the way, even though this isn't > even solved yet, thank you for all of your help! > Ok, this ended up taking a bit longer to get to due to the fact that we've been building indexes on our user tables off and on for the last few days. But, I'm back on it now. Here is my general plan of action: I'm going to do a schema dump of the pg_catalog schema from a fresh, clean 8.2 install and, tomorrow night after I do the same against the db we've been having issues with, diff the two to see if there are any glaring discrepancies. While running the dump from the live db I will have statement logging on for the dump, are there any queries or query lengths that I should pay particular attention to? -- erik jones <erik@myemma.com> software development emma(r)
On Wed, Jan 03, 2007 at 11:56:20AM -0500, Tom Lane wrote: > Erik Jones <erik@myemma.com> writes: > > Tom Lane wrote: > >> I could see this taking an unreasonable amount of time if you had a huge > >> number of pg_class rows or a very long search_path --- is your database > >> at all out of the ordinary in those ways? > >> > > Well, running "select count(*) from pg_class;" returns 524699 rows > > Ouch. > > > our search path is the default. I'd also like to reiterate that \d > > pg_class returns instantly when run from the 8.1.4 psql client connected > > to the 8.2 db. > > I think I know where the problem is: would you compare timing of > > select * from pg_class where c.relname ~ '^(pg_class)$'; > > select * from pg_class where c.relname ~ '^pg_class$'; > > Recent versions of psql put parentheses into the regex pattern for > safety in case it's got "|", but I just realized that that probably > confuses the optimizer's check for an indexable regex :-( > > However, this only explains slowdown in psql's \d commands, which > wasn't your original complaint ... On the other hand, with 500k relations pg_dump is presumably going to be doing a lot of querying of the catalog tables, so if it uses similar queries... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)