Thread: commit so slow program looks frozen
(I tried this question on the interface forum and got no result, but I don't know how to tell if it's an interface issue or not) I have a TCL app which typically takes hours to complete. I found out that it is taking longer than it should because it occasionally stalls inexplicably (for tens of minute at a time) then usually continues. There are a minimum of four apps running at the same time, all reading different sections of the same table, all writing to the same db and the same tables. The other apps seem unaffected by the one app that freezes. This happens running "pg_exec $conn "commit" from within a TCL script on a client app. The delays are so long that I used to think the app was hopelessly frozen. By accident, I left the app alone in its frozen state and came back a good deal later and seen that it was running again. Sometimes I decide it *IS* frozen and have to restart. Because Ctrl-C will not cause the script to break, it appears the app is stuck in non-TCL code (either waiting for postgres or stuck in the interface code?) The application loops through an import file, reading one row at a time, and issues a bunch of inserts and updates to various tables. There's a simple pg_exec $conn "start transaction" at the beginning of the loop and the commit at the end. The commit actually appears to be going through. There are no messages of any significance in the log. There do not appear to be any outstanding locks or transactions. I am not doing any explicit locking, all transaction settings are set to default. Any thoughts on the cause and possible solutions would be appreciated. Carlo
Carlo Stonebanks wrote: > The delays are so long that I used to think the app was hopelessly frozen. > By accident, I left the app alone in its frozen state and came back a good > deal later and seen that it was running again. > > Sometimes I decide it *IS* frozen and have to restart. Because Ctrl-C will > not cause the script to break, it appears the app is stuck in non-TCL code > (either waiting for postgres or stuck in the interface code?) You may try to figure out what's the process doing (the backend obviously, not the frontend (Tcl) process) by attaching to it with strace. Is it doing system calls? Maybe it's busy reading from or writing to disk. Maybe it's swamped by a context switch storm (but in that case, probably the other processes would be affected as well). Or you may want to attach to it with GDB and see what the backtrace looks like. If nothing obvious pops up, do it several times and compare them. I wouldn't expect it to be stuck on locks, because if it's only on commit, then it probably has all the locks it needs. But try to see if you can find something not granted in pg_locks that it may be stuck on. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> You may try to figure out what's the process doing (the backend > obviously, not the frontend (Tcl) process) by attaching to it with > strace. It's so sad when us poor Windows guys get helpful hints from people assume that we're smart enough to run *NIX... ;-) > Maybe it's swamped by a context switch storm (but in that case, probably > the other processes would be affected as well). What is a context switch storm? (and what a great name for a heavy metal rock band!) Interestingly enough, last night (after the original post) I watched three of the processes slow down, one after the other - and then stall for so long that I had assumed they had frozen. They were all stalled on a message that I had put in the script that indicated they had never returned from a commit. I have looked into this, and I believe the commits are actually going through. The remaining 4th process continued to run, and actually picked up speed as the CPU gave its cycles over. The Windows task manager shows the postgresql processes that (I assume) are associated with the stalled processes as consuming zero CPU time. Sometimes I have seen all of the apps slow down and momentarrily freeze at the same time... but then continue. I have autovacuum off, although stats_row_level and stats_start_collector remain on (I thought these were only relevant if autovacuum was on). I have seen the apps slow down (and perhaps stall) when specifical tables have vacuum/analyze running, and that makes sense. I did notice that on one occasion a "frozen" app came back to life after I shut down EMS PostgreSQL manager in another session. Maybe a coincidence, or maybe an indication that the apps are straining resources... on a box with two twin-core XEONs and 4GB of memory? Mind you, the config file is confgiured for the database loading phase weare in now - with lots of resources devoted to a few connections. > I wouldn't expect it to be stuck on locks, because if it's only on > commit, then it probably has all the locks it needs. But try to see if > you can find something not granted in pg_locks that it may be stuck on. Looking at the pgadmin server status pages, no locks or transactions are pending when this happens. Carlo
>> I have a question for you: did you have a long running query keeping open a transaction? I've just noticed the same problem here, but things cleaned up immediately when I aborted the long-running transaction.
No, the only processes are from those in the import applications themselves: short transactions never lasting more than a fraction of a second.
Carlo
Carlo Stonebanks wrote:
Note that in my case the long-running transaction wasn't idle in transaction, it was just doing a whole lot of work.
Brian
I have a question for you: did you have a long running query keeping open a transaction? I've just noticed the same problem here, but things cleaned up immediately when I aborted the long-running transaction.You may try to figure out what's the process doing (the backend obviously, not the frontend (Tcl) process) by attaching to it with strace.It's so sad when us poor Windows guys get helpful hints from people assume that we're smart enough to run *NIX... ;-)Maybe it's swamped by a context switch storm (but in that case, probably the other processes would be affected as well).What is a context switch storm? (and what a great name for a heavy metal rock band!) Interestingly enough, last night (after the original post) I watched three of the processes slow down, one after the other - and then stall for so long that I had assumed they had frozen. They were all stalled on a message that I had put in the script that indicated they had never returned from a commit. I have looked into this, and I believe the commits are actually going through.
Note that in my case the long-running transaction wasn't idle in transaction, it was just doing a whole lot of work.
Brian
On Wed, 2006-10-25 at 15:07, Carlo Stonebanks wrote: > > You may try to figure out what's the process doing (the backend > > obviously, not the frontend (Tcl) process) by attaching to it with > > strace. > > It's so sad when us poor Windows guys get helpful hints from people assume > that we're smart enough to run *NIX... ;-) You should try a google search on strace and NT or windows or XP... I was surprised how many various implementations of it I found. > > > Maybe it's swamped by a context switch storm (but in that case, probably > > the other processes would be affected as well). > > What is a context switch storm? (and what a great name for a heavy metal > rock band!) I can just see the postgresql group getting together at the next O'Reilley's conference and creating that band. And it will all be your fault. A context switch storm is when your machine spends more time trying to figure out what to do than actually doing anything. The CPU spends most it's time switching between programs than running them. > I have seen the apps slow down (and perhaps stall) when specifical tables > have vacuum/analyze running, and that makes sense. I did notice that on one > occasion a "frozen" app came back to life after I shut down EMS PostgreSQL > manager in another session. Maybe a coincidence, or maybe an indication that > the apps are straining resources... on a box with two twin-core XEONs and > 4GB of memory? Mind you, the config file is confgiured for the database > loading phase weare in now - with lots of resources devoted to a few > connections. Seeing as PostgreSQL runs one thread / process per connection, it's pretty unlikely that the problem here is one "hungry" thread. Do all four CPUs show busy, or just one? Do you have a way of measuring how much time is spent waiting on I/O on a windows machine like top / vmstat does in unix? Is it possible your machine is going into a swap storm? i.e. you've used all physical memory somehow and it's swapping out? If your current configuration is too aggresive on sort / work mem then it can happen with only a few connections. Note that if you have an import process that needs a big chunk of memory, you can set just that one connection to use a large setting and leave the default smaller.
> >>> Maybe it's swamped by a context switch storm (but in that case, probably >>> the other processes would be affected as well). >> What is a context switch storm? (and what a great name for a heavy metal >> rock band!) > > I can just see the postgresql group getting together at the next > O'Reilley's conference and creating that band. And it will all be your > fault. Well now you let the secret out! Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Wed, Oct 25, 2006 at 04:32:16PM -0400, Carlo Stonebanks wrote: > >> I have a question for you: did you have a long running query keeping open > a transaction? I've just noticed the same problem here, but things cleaned > up immediately when I aborted the long-running transaction. > > No, the only processes are from those in the import applications themselves: > short transactions never lasting more than a fraction of a second. Do you have a linux/unix machine you could reproduce this on? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> > > You may try to figure out what's the process doing (the backend > > > obviously, not the frontend (Tcl) process) by attaching > to it with > > > strace. > > > > It's so sad when us poor Windows guys get helpful hints from people > > assume that we're smart enough to run *NIX... ;-) > > You should try a google search on strace and NT or windows or > XP... I was surprised how many various implementations of it I found. Let me know if you find one that's stable, I've been wanting that. I've tried one or two, but it's always been just a matter of time before the inevitable BSOD. > > > Maybe it's swamped by a context switch storm (but in that case, > > > probably the other processes would be affected as well). > > > > What is a context switch storm? (and what a great name for a heavy > > metal rock band!) > > I can just see the postgresql group getting together at the > next O'Reilley's conference and creating that band. And it > will all be your fault. *DO NOT LET DEVRIM SEE THIS THREAD* > A context switch storm is when your machine spends more time > trying to figure out what to do than actually doing anything. > The CPU spends most it's time switching between programs > than running them. I can see Windows benig more sucepitble to this than say Linux, because switching between processes there is a lot more expensive than on Linux. > Seeing as PostgreSQL runs one thread / process per > connection, it's pretty unlikely that the problem here is one > "hungry" thread. Do all four CPUs show busy, or just one? > Do you have a way of measuring how much time is spent waiting > on I/O on a windows machine like top / vmstat does in unix? There are plenty of counters in the Performance Monitor. Specificall, look at "disk queue counters" - they indicate when the I/O subsystem is backed up. //Magnus
I seem to remember Oleg/Teodor recently reporting a problem with Windows hanging on a multi-processor machine, during a heavy load operation. In their case it seemed like a vacuum would allow it to wake up. They did commit a patch that did not make it into the last minor version for lack of testing. Perhaps you could see if that patch might work for you, which would also help ease the argument against the patches lack of testing. -rocco
> I can just see the postgresql group getting together at the next > O'Reilley's conference and creating that band. And it will all be your > fault. Finally, a chance for me to wear my black leather pants. > A context switch storm is when your machine spends more time trying to > figure out what to do than actually doing anything. The CPU spends most > it's time switching between programs than running them. Is thatl likely on a new 4 CPU server that has no clients connected and that is only running four (admittedly heavy) TCL data load scripts? > Seeing as PostgreSQL runs one thread / process per connection, it's > pretty unlikely that the problem here is one "hungry" thread. Do all > four CPUs show busy, or just one? Do you have a way of measuring how > much time is spent waiting on I/O on a windows machine like top / vmstat > does in unix? Before optimising the queries, all four CPU's were pinned to max performance (that's why I only run four imports at a time). After opimisation, all four CPU's are busy, but usage is spikey (which looks more normal), but all are obviously busy. I have this feeling that when an import app freezes, one CPU goes idle while the others stay busy - I will confirm that with the next import operation. I suspect that the server has the Xeon processors that were of a generation which PostgreSQL had a problem with - should a postgresql process be able to distrivute its processing load across CPU's? (i.e. When I see one CPU at 100% while all others are idle?) > Note that if you have an import process that needs a big chunk of > memory, you can set just that one connection to use a large setting and > leave the default smaller. Total memory usage is below the max available. Each postgresql process takes up 500MB, there are four running and I have 4GB of RAM. Carlo
This is pretty interesting - where can I read more on this? Windows isn't actually hanging, one single command line window is - from its behaviour, it looks like the TCL postgresql package is waiting for pg_exec to come back from the commit (I believe the commit has actually gone through). It could even be that there's something wrong with the TCL package, but from my understanding it is one of the most complete interfaces out there - which is weird, because TCL seems to be the most unpopular language in the community. Caro ""Rocco Altier"" <RoccoA@Routescape.com> wrote in message news:6E0907A94904D94B99D7F387E08C4F570192F2D0@FALCON.INSIGHT... >I seem to remember Oleg/Teodor recently reporting a problem with Windows > hanging on a multi-processor machine, during a heavy load operation. > > In their case it seemed like a vacuum would allow it to wake up. They > did commit a patch that did not make it into the last minor version for > lack of testing. > > Perhaps you could see if that patch might work for you, which would also > help ease the argument against the patches lack of testing. > > -rocco > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
On 10/26/06, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > This is pretty interesting - where can I read more on this? Windows isn't > actually hanging, one single command line window is - from its behaviour, it > looks like the TCL postgresql package is waiting for pg_exec to come back > from the commit (I believe the commit has actually gone through). > > It could even be that there's something wrong with the TCL package, but from > my understanding it is one of the most complete interfaces out there - which > is weird, because TCL seems to be the most unpopular language in the > community. when it happens, make sure to query pg_locks and see what is going on there lock issues are not supposed to manifest on a commit, which releases locks, but you never know. There have been reports of insonsistent lock ups on windows (espeically multi-processor) which you might be experiencing. Make sure you have the very latest version of pg 8.1.x. Also consider checking out 8.2 and see if you can reproduce the behavior there...this will require compiling postgresql. merlin
> A context switch storm is when your machine spends more time trying to > figure out what to do than actually doing anything. The CPU spends most > it's time switching between programs than running them. Well, we usually use the term "thrashing" as the generic for when your machine is spending more time on overhead than doing user work - this would include paging or context switching, along with whatever else. A context-switch storm would be a specific form of thrashing! Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
On Thu, 26 Oct 2006, Carlo Stonebanks wrote: > > It could even be that there's something wrong with the TCL package, but from > my understanding it is one of the most complete interfaces out there - which > is weird, because TCL seems to be the most unpopular language in the > community. > Not that this matters much and it's slightly off the topic of performance, but... ...I would have to check my _ancient_ emails for the name of the guy and the dates, but the integration was first done while I was a researcher at Berkeley, at the tail end of the Postgres team's funding. My team used Postgres with TCL internals to implement "the query from hell" inside the server. That was about 1994 or '95, IIRC. At that time, most people who knew both said that they were roughly equivalent, with PERL being _vastly_ less intelligible (to humans) and they hated it. What happened was PERL got exposure that TCL didn't and people who didn't know better jumped on it. So, it was one of the most complete interfaces because it was done first, or nearly first, by the original guys that created the original Postgres. Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
> when it happens, make sure to query pg_locks and see what is going on > there lock issues are not supposed to manifest on a commit, which > releases locks, but you never know. There aren't any pedning locks (assuming that pgAdmin is using pg_locks to display pendin glocks). > There have been reports of > insonsistent lock ups on windows (espeically multi-processor) which > you might be experiencing. Make sure you have the very latest version > of pg 8.1.x. Also consider checking out 8.2 and see if you can > reproduce the behavior there...this will require compiling postgresql. Are these associated with any type of CPU? Carlo
Perl started out fast - TCL started out slow. Perl used syntax that, although it would drive some people crazy, followed a linguistic curve that Larry Wall claimed was healthy. The English language is crazy, and yet, it has become standard world wide as well. Designed, regular languages like Esperanto have not received much support either. Perl is designed to be practical. TCL was designed to be minimalistic. Perl uses common idioms for UNIX programmers. // for regular expressions, $VAR for variables, Many of the statement are familiar for C programmers. ++ for increment (compare against 'incr abc' for TCL). $a=5 for assignment, compare against 'set abc 5' in TCL. TCL tries to have a reduced syntax, where 'everything is a string' which requires wierdness for people. For example, newline is end-of-line, so { must be positioned correctly. Code is a string, so in some cases you need to escape code, otherwise not. Perl has object oriented support built-in. It's ugly, but it works. TCL has a questionable '[incr tcl]' package. Perl has a wealth of modules on CPAN to do almost anything you need to. TCL has the beginning of one (not as rich), but comes built-in with things like event loops, and graphicals (Tk). I could go on and on - but I won't, because this is the PostgreSQL mailing list. People either get Perl, or TCL, or they don't. More people 'get' Perl, because it was marketted better, it's syntax is deceivingly comparable to other well known languages, and for the longest time, it was much faster than TCL to write (especially when using regular expressions) and faster to run. Did TCL get treated unfairly as a result? It's a language. Who cares! :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/
> Perl has a wealth of modules on CPAN to do almost anything you need to. > TCL has the beginning of one (not as rich), but comes built-in with things > like event loops, and graphicals (Tk). > > I could go on and on - but I won't, because this is the PostgreSQL > mailing list. People either get Perl, or TCL, or they don't. More > people 'get' Perl, because it was marketted better, it's syntax is > deceivingly comparable to other well known languages, and for the > longest time, it was much faster than TCL to write (especially when > using regular expressions) and faster to run. > > Did TCL get treated unfairly as a result? It's a language. Who cares! :-) You forgot the god of scripting languages, Python... (Yes perl is much better at system level scripting than Python). Sincerely, Joshua D. Drake > > Cheers, > mark > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On Thu, 2006-10-26 at 11:06 -0400, Merlin Moncure wrote: > On 10/26/06, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > > This is pretty interesting - where can I read more on this? Windows isn't > > actually hanging, one single command line window is - from its behaviour, it > > looks like the TCL postgresql package is waiting for pg_exec to come back > > from the commit (I believe the commit has actually gone through). > > > > It could even be that there's something wrong with the TCL package, but from > > my understanding it is one of the most complete interfaces out there - which > > is weird, because TCL seems to be the most unpopular language in the > > community. > > when it happens, make sure to query pg_locks and see what is going on > there lock issues are not supposed to manifest on a commit, which > releases locks, but you never know. There have been reports of > insonsistent lock ups on windows (espeically multi-processor) which > you might be experiencing. Make sure you have the very latest version > of pg 8.1.x. Also consider checking out 8.2 and see if you can > reproduce the behavior there...this will require compiling postgresql. Merlin, Rumour has it you managed to get a BT from Windows. That sounds like it would be very useful here. Carlo, Many things can happen at commit time. Temp tables dropped, TRUNCATEd old relations unlinked, init files removed, deferred foreign key checks (and subsequent cascading), dropped tables flushed. The assumption that COMMIT is a short request may not be correct according to the wide range of tasks that could occur according to standard SQL:2003 behaviour. Some of those effects take longer on larger systems. Any and all of those things have potential secondary effects, all of which can also conflict with other user tasks and especially with a CHECKPOINT. Then there's various forms of contention caused by misconfiguration. I do think we need some better instrumentation for this kind of thing. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
>Ben Trewern" <ben.trewern@_nospam_mowlem.com> wrote in message >news:ehsotr$20vp$1@news.hub.org... > It might be worth turning off hyperthreading if your Xeons are using it. > There have been reports of this causing inconsistent behaviour with > PostgreSQL. Yes, this issue comes up often - I wonder if the Woodcrest Xeons resolved this? Have these problems been experienced on both Linux and Windows (we are running Windows 2003 x64) Carlo
> I do think we need some better instrumentation for this kind of thing. Well, one thing's for sure - I have little other information to offer. The problem is that the lockups occur after hours of operation and thousands of rows being digested (which is the nature of the program). If "better instrumentation" implies tools to inpsect the sate of the db server's process and to know what it's waiting for from the OS, I agree. Then again, I can't even tell you whether the postgres process is at fault or the TCL interface - which would be odd, because it's one fo the most mature interfaces postgres has. So, here's a thought: is there any way for me to inspect the state of a postgres process to see if it's responsive - even if it's serving another connection? Carlo
On 10/28/06, Simon Riggs <simon@2ndquadrant.com> wrote: > On Thu, 2006-10-26 at 11:06 -0400, Merlin Moncure wrote: > > On 10/26/06, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > > > This is pretty interesting - where can I read more on this? Windows isn't > > > actually hanging, one single command line window is - from its behaviour, it > > > looks like the TCL postgresql package is waiting for pg_exec to come back > > > from the commit (I believe the commit has actually gone through). > > > > > > It could even be that there's something wrong with the TCL package, but from > > > my understanding it is one of the most complete interfaces out there - which > > > is weird, because TCL seems to be the most unpopular language in the > > > community. > > > > when it happens, make sure to query pg_locks and see what is going on > > there lock issues are not supposed to manifest on a commit, which > > releases locks, but you never know. There have been reports of > > insonsistent lock ups on windows (espeically multi-processor) which > > you might be experiencing. Make sure you have the very latest version > > of pg 8.1.x. Also consider checking out 8.2 and see if you can > > reproduce the behavior there...this will require compiling postgresql. > > Merlin, > > Rumour has it you managed to get a BT from Windows. That sounds like it > would be very useful here. > > Carlo, > > Many things can happen at commit time. Temp tables dropped, TRUNCATEd > old relations unlinked, init files removed, deferred foreign key checks > (and subsequent cascading), dropped tables flushed. The assumption that > COMMIT is a short request may not be correct according to the wide range > of tasks that could occur according to standard SQL:2003 behaviour. > > Some of those effects take longer on larger systems. Any and all of > those things have potential secondary effects, all of which can also > conflict with other user tasks and especially with a CHECKPOINT. Then > there's various forms of contention caused by misconfiguration. > > I do think we need some better instrumentation for this kind of thing. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com start here: http://beta.linuxports.com/pgsql-hackers-win32/2005-08/msg00051.php merlin
> > Yes, this issue comes up often - I wonder if the Woodcrest Xeons resolved > this? Have these problems been experienced on both Linux and Windows (we > are > running Windows 2003 x64) > > Carlo > IIRC Woodcrest doesn't have HT, just dual core with shared cache. - Bucky
Merlin Moncure wrote: > On 10/28/06, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Thu, 2006-10-26 at 11:06 -0400, Merlin Moncure wrote: >> > On 10/26/06, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: >> > > This is pretty interesting - where can I read more on this? >> Windows isn't >> > > actually hanging, one single command line window is - from its >> behaviour, it >> > > looks like the TCL postgresql package is waiting for pg_exec to >> come back >> > > from the commit (I believe the commit has actually gone through). >> > > >> > > It could even be that there's something wrong with the TCL >> package, but from >> > > my understanding it is one of the most complete interfaces out >> there - which >> > > is weird, because TCL seems to be the most unpopular language in the >> > > community. >> > >> > when it happens, make sure to query pg_locks and see what is going on >> > there lock issues are not supposed to manifest on a commit, which >> > releases locks, but you never know. There have been reports of >> > insonsistent lock ups on windows (espeically multi-processor) which >> > you might be experiencing. Make sure you have the very latest version >> > of pg 8.1.x. Also consider checking out 8.2 and see if you can >> > reproduce the behavior there...this will require compiling postgresql. >> >> Merlin, >> >> Rumour has it you managed to get a BT from Windows. That sounds like it >> would be very useful here. Could it be there is a hangup in communication with the backend via the libpq library? I have a situation on Windows where psql seems to be hanging randomly AFTER completing (or almost completing) a vacuum full analyze verbose. I'm running the same databases on a single postgres instance on a Dell 4gb RAM 2 processor xeon (hyper-threading turned off) running Debian GNU/Linux. The windows system is an IBM 24gb RAM, 4 processor xeon (hyperthreading turned off). No problems on the Dell, it runs pgbench faster than the windows IBM system. The Dell Linux system zips through vacuumdb --all --analyze --full --verbose with no problems. The windows machine is running 6 instances of postgresql because of problems trying to load all of the databases into one instance on windows. The last output from psql is: INFO: free space map contains 474 pages in 163 relations DETAIL: A total of 2864 page slots are in use (including overhead). 2864 page slots are required to track all free space. Current limits are: 420000 page slots, 25000 relations, using 4154 KB. (I've currently restarted postgresql with more reasonable fsm_page_slots and fsm_relations). It appears that psql is hung in the call to WS2_32!select. The psql stack trace looks like this: ntdll!KiFastSystemCallRet ntdll!NtWaitForSingleObject+0xc mswsock!SockWaitForSingleObject+0x19d mswsock!WSPSelect+0x380 WS2_32!select+0xb9 WARNING: Stack unwind information not available. Following frames may be wrong. libpq!PQenv2encoding+0x1fb libpq!PQenv2encoding+0x3a1 libpq!PQenv2encoding+0x408 libpq!PQgetResult+0x58 libpq!PQgetResult+0x188 psql+0x4c0f psql+0x954d psql+0x11e7 psql+0x1238 kernel32!IsProcessorFeaturePresent+0x9e With more detail: # ChildEBP RetAddr Args to Child 00 0022f768 7c822124 71b23a09 000007a8 00000001 ntdll!KiFastSystemCallRet (FPO: [0,0,0]) 01 0022f76c 71b23a09 000007a8 00000001 0022f794 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0]) 02 0022f7a8 71b23a52 000007a8 00000780 00000000 mswsock!SockWaitForSingleObject+0x19d (FPO: [Non-Fpo]) 03 0022f898 71c0470c 00000781 0022fc40 0022fb30 mswsock!WSPSelect+0x380 (FPO: [Non-Fpo]) 04 0022f8e8 6310830b 00000781 0022fc40 0022fb30 WS2_32!select+0xb9 (FPO: [Non-Fpo]) WARNING: Stack unwind information not available. Following frames may be wrong. 05 0022fd68 631084b1 00000000 ffffffff 0000001d libpq!PQenv2encoding+0x1fb 06 0022fd88 63108518 00000001 00000000 00614e70 libpq!PQenv2encoding+0x3a1 07 0022fda8 631060f8 00000001 00000000 00614e70 libpq!PQenv2encoding+0x408 08 0022fdc8 63106228 00614e70 00613a71 00615188 libpq!PQgetResult+0x58 09 0022fde8 00404c0f 00614e70 00613a71 0041ac7a libpq!PQgetResult+0x188 0a 0022fe98 0040954d 00613a71 00423180 00423185 psql+0x4c0f 0b 0022ff78 004011e7 00000006 00613b08 00612aa8 psql+0x954d 0c 0022ffb0 00401238 00000001 00000009 0022fff0 psql+0x11e7 0d 0022ffc0 77e523e5 00000000 00000000 7ffdc000 psql+0x1238 0e 0022fff0 00000000 00401220 00000000 78746341 kernel32!IsProcessorFeaturePresent+0x9e the pg_locks table: -[ RECORD 1 ]-+---------------- locktype | relation database | 19553 relation | 10342 page | tuple | transactionid | classid | objid | objsubid | transaction | 1998424 pid | 576 mode | AccessShareLock granted | t -[ RECORD 2 ]-+---------------- locktype | transactionid database | relation | page | tuple | transactionid | 1998424 classid | objid | objsubid | transaction | 1998424 pid | 576 mode | ExclusiveLock granted | t The call stack on the postgres.exe process id 576: ntdll!KiFastSystemCallRet ntdll!NtWaitForMultipleObjects+0xc WARNING: Stack unwind information not available. Following frames may be wrong. kernel32!ResetEvent+0x45 postgres!pgwin32_waitforsinglesocket+0x89 postgres!pgwin32_recv+0x82 postgres!secure_read+0x7b postgres!TouchSocketFile+0x93 postgres!pq_getbyte+0x22 postgres!PostgresMain+0x1056 postgres!SubPostmasterMain+0x9ca postgres!main+0x33f postgres+0x11e7 postgres+0x1238 kernel32!IsProcessorFeaturePresent+0x9e These are the parameters: listen_addresses = '*' port = 5432 max_connections = 300 shared_buffers = 30000 temp_buffers = 5000 work_mem = 4096 max_fsm_pages = 25000 max_fsm_relations = 500 vacuum_cost_delay = 50 wal_buffers = 32 checkpoint_segments = 16 effective_cache_size = 50000 random_page_cost = 3 default_statistics_target = 300 log_destination = 'stderr' redirect_stderr = on (Since there's 24gb RAM on this thing, I've apparently gotten the system cache up to about 6 gb, according sysinternals.com "System Information" applet.) "Cache Data Map Hits %" runs at 100% Physical Disk Queue Lengths are almost non-existent Processors are not very busy at all Seems like once something is in memory, we never have to go back to the disk again except to write. (hope so with that kind of RAM). >> >> Carlo, >> >> Many things can happen at commit time. Temp tables dropped, TRUNCATEd >> old relations unlinked, init files removed, deferred foreign key checks >> (and subsequent cascading), dropped tables flushed. The assumption that >> COMMIT is a short request may not be correct according to the wide range >> of tasks that could occur according to standard SQL:2003 behaviour. >> >> Some of those effects take longer on larger systems. Any and all of >> those things have potential secondary effects, all of which can also >> conflict with other user tasks and especially with a CHECKPOINT. Then >> there's various forms of contention caused by misconfiguration. >> >> I do think we need some better instrumentation for this kind of thing. >> >> -- >> Simon Riggs >> EnterpriseDB http://www.enterprisedb.com > > start here: > http://beta.linuxports.com/pgsql-hackers-win32/2005-08/msg00051.php > > merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > >