Thread: 8.2.3: Server crashes on Windows using Eclipse/Junit
Hi, I have a large amount of tests I run in Eclipse to test my application. Many of them create and delete a lot of information in PG and at some point, PG will crash and restart. I get en error in the logs that state: Server process exited with exit code -1073741502 . . . Terminating connection because of crash of another server process If it helps: - I am using Windows XP - I have 2 GB of memory - I am using JPA/Hibernate3 and the Postgres Java driver Any ideas? L -- Prenez la parole en public en étant Speak to an audience while being moins nerveux et plus convaincant! less nervous and more convincing! Éveillez l'orateur en vous! Bring out the speaker in you! Information: laurent@duperval.com http://www.duperval.com (514) 902-0186
On Mon, 15 Oct 2007, Laurent Duperval wrote: > I have a large amount of tests I run in Eclipse to test my application. > Many of them create and delete a lot of information in PG and at some > point, PG will crash and restart. > > I get en error in the logs that state: > > Server process exited with exit code -1073741502 This is likely a server bug. If you can isolate the failing test and extract a self contained example someone can probably fix it. Kris Jurka
On Mon, 15 Oct 2007 15:06:37 -0400, Kris Jurka wrote: >> I get en error in the logs that state: >> >> Server process exited with exit code -1073741502 > > This is likely a server bug. If you can isolate the failing test and > extract a self contained example someone can probably fix it. > It seems to be some sort of interaction between Eclipse and Junit/Postgres driver. When I run my tests, just before the server crash, I have dozens and dozens of spawned Postgres processes. When the crash occurs, all processes are killed and restarted again. And this process continues until the tests complete. When I run the tests from an ant script I also see some spawned processes, but nothing like running it in Eclipse. If I run each test case separately, I don't see this issue. But when I run them as a whole (i.e. run all tests defined in my application) I get the same error every time. L > Kris Jurka > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Prenez la parole en public en étant Speak to an audience while being moins nerveux et plus convaincant! less nervous and more convincing! Éveillez l'orateur en vous! Bring out the speaker in you! Information: laurent@duperval.com http://www.duperval.com (514) 902-0186
I will add that speed may be a factor also. When I increase the amount of logging by the PG server, I see the problem less often. L On Mon, 15 Oct 2007 17:58:48 +0000, Laurent Duperval wrote: > Hi, > > I have a large amount of tests I run in Eclipse to test my application. > Many of them create and delete a lot of information in PG and at some > point, PG will crash and restart. > > I get en error in the logs that state: > > Server process exited with exit code -1073741502 > . > . > . > Terminating connection because of crash of another server process > > If it helps: > > - I am using Windows XP > - I have 2 GB of memory > - I am using JPA/Hibernate3 and the Postgres Java driver > > Any ideas? > > L > -- Prenez la parole en public en étant Speak to an audience while being moins nerveux et plus convaincant! less nervous and more convincing! Éveillez l'orateur en vous! Bring out the speaker in you! Information: laurent@duperval.com http://www.duperval.com (514) 902-0186
Laurent Duperval wrote: > On Mon, 15 Oct 2007 15:06:37 -0400, Kris Jurka wrote: > > >> I get en error in the logs that state: > >> > >> Server process exited with exit code -1073741502 > > > > This is likely a server bug. If you can isolate the failing test and > > extract a self contained example someone can probably fix it. > > It seems to be some sort of interaction between Eclipse and Junit/Postgres > driver. When I run my tests, just before the server crash, I have dozens > and dozens of spawned Postgres processes. When the crash occurs, all > processes are killed and restarted again. And this process continues until > the tests complete. The fact that all Postgres processes disappear is normal. Postgres itself (more precisely, the postmaster process) kills all server processes when one of them dies unexpectedly. > When I run the tests from an ant script I also see some spawned processes, > but nothing like running it in Eclipse. > > If I run each test case separately, I don't see this issue. But when I run > them as a whole (i.e. run all tests defined in my application) I get the > same error every time. Maybe Eclipse is trying to run more of them at a time than ant, and the extra concurrency is killing the server for some reason. Was this compiled with Cygwin, or is it the native (mingw) version? -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J "La vida es para el que se aventura"
Alvaro Herrera <alvherre@commandprompt.com> writes: > Maybe Eclipse is trying to run more of them at a time than ant, and the > extra concurrency is killing the server for some reason. Was this > compiled with Cygwin, or is it the native (mingw) version? Don't both those builds have some hard-wired upper limit on the number of child processes? I wonder what Laurent has max_connections set to... if it's larger than the build could actually support, perhaps this behavior would be the result. regards, tom lane
On 10/16/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Laurent Duperval wrote: > > >> I get en error in the logs that state: > > >> > > >> Server process exited with exit code -1073741502 FYI, this exit code means a DLL's initialization routine indicated failure during process startup. > > If I run each test case separately, I don't see this issue. But when I run > > them as a whole (i.e. run all tests defined in my application) I get the > > same error every time. > > Maybe Eclipse is trying to run more of them at a time than ant, and the > extra concurrency is killing the server for some reason. Sounds likely.
Hi, Sorry for top-posting but since I am answering questions that don't all appear in this message: - I installed the default download of Postgres. I didn't compile myself, so it's probably the mingw version - Max_connections is set to 500. I did that originally because I kept seeing a message about no connection available and I thought it was because I was not allocating enough connections. My machine has 2GB of RAM. - How do I determine what DLL is failing and what is causing it to fail in its initialization routine? Thanks, L On Tue, 16 Oct 2007 16:02:32 -0700, Trevor Talbot wrote: > On 10/16/07, Alvaro Herrera <alvherre@commandprompt.com> wrote: >> Laurent Duperval wrote: > >> > >> I get en error in the logs that state: >> > >> >> > >> Server process exited with exit code -1073741502 > > FYI, this exit code means a DLL's initialization routine indicated > failure during process startup. > >> > If I run each test case separately, I don't see this issue. But when I run >> > them as a whole (i.e. run all tests defined in my application) I get the >> > same error every time. >> >> Maybe Eclipse is trying to run more of them at a time than ant, and the >> extra concurrency is killing the server for some reason. > > Sounds likely. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Prenez la parole en public en étant Speak to an audience while being moins nerveux et plus convaincant! less nervous and more convincing! Éveillez l'orateur en vous! Bring out the speaker in you! Information: laurent@duperval.com http://www.duperval.com (514) 902-0186
> Hi, > > Sorry for top-posting but since I am answering questions that don't all > appear in this message: > > - I installed the default download of Postgres. I didn't compile myself, > so it's probably the mingw version It is. > - Max_connections is set to 500. I did that originally because I kept > seeing a message about no connection available and I thought it was > because I was not allocating enough connections. My machine has 2GB of RAM. There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 dependingon some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb ram,not above 100. You'll ned to figure out what's eating all your connections - it sounds like it's not entirely expected. Perhaps conectionsare leaked somewhere? > - How do I determine what DLL is failing and what is causing it to fail in > its initialization routine? You really can't in this case, but if you could it wouldn't help you. It's windows running out of global resources. /Magnus
"Magnus Hagander" <magnus@hagander.net> writes: >> - Max_connections is set to 500. > There's your problem. 500 is way above what the windows version can > handle. IIRC the hard max is somewhere around 200 depending on some OS > factors that we don't entirely know. Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections to something we know the platform can stand? It'd be more comfortable if we understood exactly where the limit was, but I think I'd rather have an "I'm sorry Dave, I can't do that" than random-seeming crashes. regards, tom lane
On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote: > "Magnus Hagander" <magnus@hagander.net> writes: > >> - Max_connections is set to 500. > > > There's your problem. 500 is way above what the windows version can > > handle. IIRC the hard max is somewhere around 200 depending on some OS > > factors that we don't entirely know. > > Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections > to something we know the platform can stand? It'd be more comfortable > if we understood exactly where the limit was, but I think I'd rather > have an "I'm sorry Dave, I can't do that" than random-seeming crashes. Yeayh, that's probably a good idea - except we never managed to figure out where the limit is. It appears to vary pretty wildly between different machines, for reasons we don't really know why (total RAM has some effect on it, but that's not the only one, for example) //Magnus
On Wed, 17 Oct 2007 09:22:11 +0200 Magnus Hagander <magnus@hagander.net> wrote: > > Maybe we should put an #ifdef WIN32 into guc.c to limit > > max_connections to something we know the platform can stand? It'd > > be more comfortable if we understood exactly where the limit was, > > but I think I'd rather have an "I'm sorry Dave, I can't do that" > > than random-seeming crashes. > > Yeayh, that's probably a good idea - except we never managed to > figure out where the limit is. It appears to vary pretty wildly > between different machines, for reasons we don't really know why > (total RAM has some effect on it, but that's not the only one, for > example) How about we just emit a warning.. WARNING: Connections above 250 on Windows platforms may have unpredictable results. Joshua D. Drake > > //Magnus > > ---------------------------(end of > broadcast)--------------------------- TIP 1: if posting/reading > through Usenet, please send an appropriate subscribe-nomail command > to majordomo@postgresql.org so that your message can get through to > the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/
Attachment
"Magnus Hagander" wrote: >> - Max_connections is set to 500. I did that originally because I kept >> seeing a message about no connection available and I thought it was >> because I was not allocating enough connections. My machine has 2GB of RAM. > >There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 dependingon some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb ram,not above 100. My guess is that Windows is running out of handles. Each backend uses about 150 handles. 100 Backends means 15000 handles. Depending how many other programs are currently running the no. of startable backends will vary depending on the total handle limit Windows imposes. Rainer
> > > Maybe we should put an #ifdef WIN32 into guc.c to limit > > > max_connections to something we know the platform can stand? It'd > > > be more comfortable if we understood exactly where the limit was, > > > but I think I'd rather have an "I'm sorry Dave, I can't do that" > > > than random-seeming crashes. > > > > Yeayh, that's probably a good idea - except we never managed to > > figure out where the limit is. It appears to vary pretty wildly > > between different machines, for reasons we don't really know why > > (total RAM has some effect on it, but that's not the only one, for > > example) > > How about we just emit a warning.. > > WARNING: Connections above 250 on Windows platforms may have > unpredictable results. > That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq? /Magnus
"Magnus Hagander" <magnus@hagander.net> writes: >> How about we just emit a warning.. >> >> WARNING: Connections above 250 on Windows platforms may have >> unpredictable results. > That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq? Unless we've got some credible basis for citing a particular number, I don't think this will help much. Rainer Bauer <usenet@munnin.com> writes: > My guess is that Windows is running out of handles. Each backend uses about > 150 handles. 100 Backends means 15000 handles. Depending how many other > programs are currently running the no. of startable backends will vary > depending on the total handle limit Windows imposes. I find this theory very interesting; for one thing it explains the reported variability of results, since the non-Postgres demand for handles could be anything. Is there any way we could check it? If it's accurate, what we ought to be whining about is some combination of max_connections and max_files_per_process, rather than only considering the former. regards, tom lane
Tom Lane wrote: > "Magnus Hagander" <magnus@hagander.net> writes: >>> How about we just emit a warning.. >>> >>> WARNING: Connections above 250 on Windows platforms may have >>> unpredictable results. > >> That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq? > > Unless we've got some credible basis for citing a particular number, > I don't think this will help much. ok. Maybe a note in the docs or FAQ at least? > Rainer Bauer <usenet@munnin.com> writes: >> My guess is that Windows is running out of handles. Each backend uses about >> 150 handles. 100 Backends means 15000 handles. Depending how many other >> programs are currently running the no. of startable backends will vary >> depending on the total handle limit Windows imposes. > > I find this theory very interesting; for one thing it explains the > reported variability of results, since the non-Postgres demand for > handles could be anything. Is there any way we could check it? > If it's accurate, what we ought to be whining about is some > combination of max_connections and max_files_per_process, rather > than only considering the former. It's not that simple. Merlin ran some checks, and drastically reducing max_files_per_process made no measurable difference. My best guess is it's due to the non-paged pool. Handles are a part of what goes in there, but only a part. //Magnus
On 10/17/07, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote: > > Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections > > to something we know the platform can stand? It'd be more comfortable > > if we understood exactly where the limit was, but I think I'd rather > > have an "I'm sorry Dave, I can't do that" than random-seeming crashes. > > Yeayh, that's probably a good idea - except we never managed to figure out > where the limit is. It appears to vary pretty wildly between different > machines, for reasons we don't really know why (total RAM has some effect > on it, but that's not the only one, for example) I tried generating idle connections in an effort to reproduce Laurent's problem, but I ran into a local limit instead: for each backend, postmaster creates a thread and burns 4MB of its 2GB address space. It fails around 490. Laurent's issue must depend on other load characteristics. It's possible to get a trace of DLL loads, but I haven't found a noninvasive way of doing that. It seems to require a debugger be attached.
On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: > "Magnus Hagander" wrote: > > >> - Max_connections is set to 500. I did that originally because I kept > >> seeing a message about no connection available and I thought it was > >> because I was not allocating enough connections. My machine has 2GB of RAM. > > > >There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than2Gb ram, not above 100. > > My guess is that Windows is running out of handles. Each backend uses about > 150 handles. 100 Backends means 15000 handles. Depending how many other > programs are currently running the no. of startable backends will vary > depending on the total handle limit Windows imposes. Those are kernel object handles; the ceiling does depend on available kernel memory, but they're cheap, and postgres is in no danger of running into that limit. Most of the handle limits people talk about are on USER (window etc) objects, which come from a single shared pool.
--- "Joshua D. Drake" <jd@commandprompt.com> wrote: > > How about we just emit a warning.. > > WARNING: Connections above 250 on Windows platforms may have > unpredictable results. > > Joshua D. Drake > I'd personally vote for a lower warning limit like 175 as I can consistently crash Postgresql on Windows system right around the 200th connection. Regards, Shelby Cain __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote: > I'd personally vote for a lower warning limit like 175 as I can > consistently crash Postgresql on Windows system right around the 200th > connection. What error gets logged for your crashes?
"Trevor Talbot" wrote: >On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: >> "Magnus Hagander" wrote: >> >> >> - Max_connections is set to 500. I did that originally because I kept >> >> seeing a message about no connection available and I thought it was >> >> because I was not allocating enough connections. My machine has 2GB of RAM. >> > >> >There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than2Gb ram, not above 100. >> >> My guess is that Windows is running out of handles. Each backend uses about >> 150 handles. 100 Backends means 15000 handles. Depending how many other >> programs are currently running the no. of startable backends will vary >> depending on the total handle limit Windows imposes. > >Those are kernel object handles; the ceiling does depend on available >kernel memory, but they're cheap, and postgres is in no danger of >running into that limit. Most of the handle limits people talk about >are on USER (window etc) objects, which come from a single shared >pool. You are right. I just did a quick test and depending on the handle type these limits are quite high. I could create 5 millions events or 4 millions semaphores or 3,5 millions mutexes before the system returned error 1816 ERROR_NOT_ENOUGH_QUOTA "Not enough quota is available to process this command.". Rainer
--- Trevor Talbot <quension@gmail.com> wrote: > On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote: > > > I'd personally vote for a lower warning limit like 175 as I can > > consistently crash Postgresql on Windows system right around the > 200th > > connection. > > What error gets logged for your crashes? > It's been a while but IIRC there wasn't anything in the logs other than an entry noting that a backend had crashed unexpectedly so the postmaster was restarting all active backends. I can trivially reproduce it at work on my workstation if you need the exact error text. Regards, Shelby Cain __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
I wrote: >You are right. I just did a quick test and depending on the handle type these >limits are quite high. I could create 5 millions events or 4 millions >semaphores or 3,5 millions mutexes before the system returned error 1816 >ERROR_NOT_ENOUGH_QUOTA "Not enough quota is available to process this >command.". [Does some further testing] The limit is high, but nonetheless Postgres is running out of handles. Setting <max_connections> to 10000 and starting postgres _without_ any connection consumes 40000 handles. This correspodends to the 4 Postgres processes running after the server was started. Every new connection consumes another 10000 handles. I don't know the Postgres code involved, but it seems that every backend consumes at least <max_connections> handles. Hence increasing this value will have the opposite effect once a certain threshold is met. Rainer
On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote: > > --- Trevor Talbot <quension@gmail.com> wrote: > > > On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote: > > > > > I'd personally vote for a lower warning limit like 175 as I can > > > consistently crash Postgresql on Windows system right around the > > 200th > > > connection. > > > > What error gets logged for your crashes? > > > > It's been a while but IIRC there wasn't anything in the logs other than > an entry noting that a backend had crashed unexpectedly so the > postmaster was restarting all active backends. I can trivially > reproduce it at work on my workstation if you need the exact error > text. I think it would be useful; if nothing else, maybe it'll tell us if you can see the same problem Laruent does, or if it's a different limit entirely.
Shelby Cain wrote: >--- Trevor Talbot <quension@gmail.com> wrote: > >> On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote: >> >> > I'd personally vote for a lower warning limit like 175 as I can >> > consistently crash Postgresql on Windows system right around the >> 200th >> > connection. >> >> What error gets logged for your crashes? >> > >It's been a while but IIRC there wasn't anything in the logs other than >an entry noting that a backend had crashed unexpectedly so the >postmaster was restarting all active backends. I can trivially >reproduce it at work on my workstation if you need the exact error >text. I could reproduce this here: Server closed the connection unexpectedly This probaly means the server terminated abnormally before or while processing the request 2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code -1073741502 Shelby, are you using the /3GB switch by chance? This will half the no. of available handles on your system. Rainer
On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: > I could reproduce this here: > > Server closed the connection unexpectedly > This probaly means the server terminated abnormally before or while processing > the request > > 2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code > -1073741502 How?
--- Rainer Bauer <usenet@munnin.com> wrote: > I could reproduce this here: > > Server closed the connection unexpectedly > This probaly means the server terminated abnormally before or while > processing > the request > > 2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit > code > -1073741502 > > > Shelby, are you using the /3GB switch by chance? This will half the > no. of > available handles on your system. > > Rainer > Probably not although I haven't examined boot.ini. My workstation only has 1.5 GB of ram so I'm highly doubtful that IBM would have configured it to boot with the /3GB switch. Regards, Shelby Cain __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
I wrote > Anyway, the problem are the no. of semaphores created by Postgres: > Every backend creates at least 4*<max_connections> semaphores. Sorry, this must read <max_connections> semaphores, not 4 times. Rainer
Hello Trevor, Sunday, October 21, 2007, 12:15:25 AM, you wrote: TT> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: >> I could reproduce this here: >> >> Server closed the connection unexpectedly >> This probaly means the server terminated abnormally before or while processing >> the request >> >> 2007-10-20 23:33:42 LOG: server process (PID 5240) exited with exit code >> -1073741502 TT> How? Seems like the mailiming list is not catching up fast enough (I am posting through usenet)... Anyway, the problem are the no. of semaphores created by Postgres: Every backend creates at least 4*<max_connections> semaphores. Just increase <max_connections> to an unusual high value (say 10000) and start creating new connections while monitoring the handle count. Rainer
On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: > Anyway, the problem are the no. of semaphores created by Postgres: > Every backend creates at least 4*<max_connections> semaphores. Just > increase <max_connections> to an unusual high value (say 10000) and > start creating new connections while monitoring the handle count. Hmm, they're actually the same semaphores, so the only cost is for slots in each process's handle table, which comes from kernel paged pool. Testing shows I can easily create about 30 million handles to a given object on this machine. This is under win2003 with 1.25GB RAM, which gives it a paged pool limit of 352MB. I tried going up to 20000 max_connections, and still blew postmaster's VM space long before paged pool was exhausted. I couldn't test any higher values, as there's some interaction between max_connections and shared_buffers that prevents it from mapping the buffer contiguously. Something's missing though, since I'm not hitting the same issue you are. How are you generating the connections? I just have an app calling PQconnectdb() in a loop, but I guess that's not good enough.
Trevor Talbot wrote: > On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: > >> Anyway, the problem are the no. of semaphores created by Postgres: >> Every backend creates at least 4*<max_connections> semaphores. Just >> increase <max_connections> to an unusual high value (say 10000) and >> start creating new connections while monitoring the handle count. > > Hmm, they're actually the same semaphores, so the only cost is for > slots in each process's handle table, which comes from kernel paged > pool. Testing shows I can easily create about 30 million handles to a > given object on this machine. This is under win2003 with 1.25GB RAM, > which gives it a paged pool limit of 352MB. > > I tried going up to 20000 max_connections, and still blew postmaster's > VM space long before paged pool was exhausted. I couldn't test any > higher values, as there's some interaction between max_connections and > shared_buffers that prevents it from mapping the buffer contiguously. > > Something's missing though, since I'm not hitting the same issue you > are. How are you generating the connections? I just have an app > calling PQconnectdb() in a loop, but I guess that's not good enough. Yeah, something is obviously missing.. Are you guys on the exactly the same Windows versions? WRT both version and servivepack. Anybody on x64 windows? Another thing worth testing - check if the amount of shared memory used makes a noticable difference. Try both very small and very large values. I don't think the paged pool is the problem - I think it's the nonpaged pool. Would be interesting to track that one in the failing case (using performance monitor, up to the point where it fails). And the nonpaged one is smaller... If that looks like it's the problem, it could be helpful to do a pooltag trace on it (see for example http://blogs.msdn.com/ntdebugging/archive/2006/12/18/Understanding-Pool-Consumption-and-Event-ID_3A00_--2020-or-2019.aspx) //Magnus
Trevor Talbot wrote: > On 10/17/07, Magnus Hagander <magnus@hagander.net> wrote: >> On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote: > >>> Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections >>> to something we know the platform can stand? It'd be more comfortable >>> if we understood exactly where the limit was, but I think I'd rather >>> have an "I'm sorry Dave, I can't do that" than random-seeming crashes. >> Yeayh, that's probably a good idea - except we never managed to figure out >> where the limit is. It appears to vary pretty wildly between different >> machines, for reasons we don't really know why (total RAM has some effect >> on it, but that's not the only one, for example) > > I tried generating idle connections in an effort to reproduce > Laurent's problem, but I ran into a local limit instead: for each > backend, postmaster creates a thread and burns 4MB of its 2GB address > space. It fails around 490. Oh, that's interesting. That's actually a sideeffect of us increasing the stack size for the postgres.exe executable in order to work on other things. By default, it burns 1MB/thread, but ours will do 4MB. Never really thought of the problem that it'll run out of address space. Unfortunately, that size can't be changed in the CreateThread() call - only the initially committed size can be changed there. There are two ways to get around it - one is not using a thread for each backend, but a single thread that handles them all and then some sync objects around it. We originally considered this but said we won't bother changing it because the current way is simpler, and the overhead of a thread is tiny compared to a process. I don't think anybody even thought about the fact that it'd run you out of address space... The other way is to finish off win64 support :-) Which I plan to look at, but I don't think that alone should be considered a solution. The question is if it's worth fixing that part, if it will just fall down for other reasons before we reach these 500 connections anyway. Can you try having your program actually run some queries and so, and not just do a PQconnect? To see if it falls over then, because it's been doing more? > Laurent's issue must depend on other load characteristics. It's > possible to get a trace of DLL loads, but I haven't found a > noninvasive way of doing that. It seems to require a debugger be > attached. AFAIK, it does require that, yes. //Magnus
Magnus Hagander wrote: >Trevor Talbot wrote: >> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: >> >>> Anyway, the problem are the no. of semaphores created by Postgres: >>> Every backend creates at least 4*<max_connections> semaphores. Just >>> increase <max_connections> to an unusual high value (say 10000) and >>> start creating new connections while monitoring the handle count. >> >> Hmm, they're actually the same semaphores, so the only cost is for >> slots in each process's handle table, which comes from kernel paged >> pool. Testing shows I can easily create about 30 million handles to a >> given object on this machine. This is under win2003 with 1.25GB RAM, >> which gives it a paged pool limit of 352MB. On my system I can only create about 4 millions semaphores. >> I tried going up to 20000 max_connections, and still blew postmaster's >> VM space long before paged pool was exhausted. I couldn't test any >> higher values, as there's some interaction between max_connections and >> shared_buffers that prevents it from mapping the buffer contiguously. >> >> Something's missing though, since I'm not hitting the same issue you >> are. How are you generating the connections? I just have an app >> calling PQconnectdb() in a loop, but I guess that's not good enough. I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to establish the test connections. >Yeah, something is obviously missing.. Are you guys on the exactly the >same Windows versions? WRT both version and servivepack. Anybody on x64 >windows? No, I am using WinXP SP2 32 bit with 2GB RAM. These are my altered settings from the default 8.2.5 Postgres installation: ssl = on shared_buffers = 512MB work_mem = 16MB maintenance_work_mem = 256MB wal_sync_method = fsync_writethrough checkpoint_segments = 15 checkpoint_timeout = 30min random_page_cost = 3.0 effective_cache_size = 1GB autovacuum_vacuum_scale_factor = 0.10 autovacuum_analyze_scale_factor = 0.05 >Another thing worth testing - check if the amount of shared memory used >makes a noticable difference. Try both very small and very large values. Well I tried different shared_buffers settings, but the result was consisting: with max_connections set to 10000, I can create 150 database connections. However, I checked the handle count at the moment the last connection fails and it is only at 1,5 million. So it seems the handles are not the primary problem. Let me know if you want any other tests performed on this machine. I also have VS2005 installed, but until now I haven't compiled Postgres here (I was waiting for 8.3 which fully supports building with VS). Rainer
On Sun, Oct 21, 2007 at 09:43:27PM +0200, Rainer Bauer wrote: > Magnus Hagander wrote: > > >Trevor Talbot wrote: > >> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: > >> > >>> Anyway, the problem are the no. of semaphores created by Postgres: > >>> Every backend creates at least 4*<max_connections> semaphores. Just > >>> increase <max_connections> to an unusual high value (say 10000) and > >>> start creating new connections while monitoring the handle count. > >> > >> Hmm, they're actually the same semaphores, so the only cost is for > >> slots in each process's handle table, which comes from kernel paged > >> pool. Testing shows I can easily create about 30 million handles to a > >> given object on this machine. This is under win2003 with 1.25GB RAM, > >> which gives it a paged pool limit of 352MB. > > On my system I can only create about 4 millions semaphores. Is that 4 million semaphores, or 4 million handles to a smaller number of semaphores? > >> I tried going up to 20000 max_connections, and still blew postmaster's > >> VM space long before paged pool was exhausted. I couldn't test any > >> higher values, as there's some interaction between max_connections and > >> shared_buffers that prevents it from mapping the buffer contiguously. > >> > >> Something's missing though, since I'm not hitting the same issue you > >> are. How are you generating the connections? I just have an app > >> calling PQconnectdb() in a loop, but I guess that's not good enough. > > I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to > establish the test connections. Could you try the same tests with the client runnint on a different system? Since the client eats up a bunch of handles and such as well, and that would eliminate the difference due to different clients. > >Yeah, something is obviously missing.. Are you guys on the exactly the > >same Windows versions? WRT both version and servivepack. Anybody on x64 > >windows? > > No, I am using WinXP SP2 32 bit with 2GB RAM. Ok. So one is on XP and one is on 2003. That' interesting - given that 2003 is tuned towards servers, it doesn't surprise me that it allows more clients before breaking. > These are my altered settings from the default 8.2.5 Postgres installation: > ssl = on Does it make a difference if you turn this off? > shared_buffers = 512MB As a general note, thsi is *way* too high. All evidence I've seen points to that you should have shared_buffers as *small* as possible on win32, because memory access there is slow. And leave more of the caching up to the OS. > work_mem = 16MB > maintenance_work_mem = 256MB > wal_sync_method = fsync_writethrough > checkpoint_segments = 15 > checkpoint_timeout = 30min > random_page_cost = 3.0 > effective_cache_size = 1GB > autovacuum_vacuum_scale_factor = 0.10 > autovacuum_analyze_scale_factor = 0.05 None of those should make a difference on this. > > >Another thing worth testing - check if the amount of shared memory used > >makes a noticable difference. Try both very small and very large values. > > Well I tried different shared_buffers settings, but the result was consisting: > with max_connections set to 10000, I can create 150 database connections. Ok. But if you decrease max_connections, you can have more connections? Or the other way around? > However, I checked the handle count at the moment the last connection fails > and it is only at 1,5 million. So it seems the handles are not the primary > problem. Good, it shouldn't be, but it's good to have that confirmed. /Magnus
On Mon, Oct 22, 2007 at 10:23:16AM +0200, Magnus Hagander wrote: > > >> I tried going up to 20000 max_connections, and still blew postmaster's > > >> VM space long before paged pool was exhausted. I couldn't test any > > >> higher values, as there's some interaction between max_connections and > > >> shared_buffers that prevents it from mapping the buffer contiguously. > > >> > > >> Something's missing though, since I'm not hitting the same issue you > > >> are. How are you generating the connections? I just have an app > > >> calling PQconnectdb() in a loop, but I guess that's not good enough. > > > > I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to > > establish the test connections. > > Could you try the same tests with the client runnint on a different system? > Since the client eats up a bunch of handles and such as well, and that > would eliminate the difference due to different clients. Followup, when running these tests, could you check using Process Explorer if you're hitting close to the limit of either of the two pools? See http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx //Magnus
On Mon, Oct 22, 2007 at 10:41:14AM +0200, Magnus Hagander wrote: > On Mon, Oct 22, 2007 at 10:23:16AM +0200, Magnus Hagander wrote: > > > >> I tried going up to 20000 max_connections, and still blew postmaster's > > > >> VM space long before paged pool was exhausted. I couldn't test any > > > >> higher values, as there's some interaction between max_connections and > > > >> shared_buffers that prevents it from mapping the buffer contiguously. > > > >> > > > >> Something's missing though, since I'm not hitting the same issue you > > > >> are. How are you generating the connections? I just have an app > > > >> calling PQconnectdb() in a loop, but I guess that's not good enough. > > > > > > I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to > > > establish the test connections. > > > > Could you try the same tests with the client runnint on a different system? > > Since the client eats up a bunch of handles and such as well, and that > > would eliminate the difference due to different clients. > > Followup, when running these tests, could you check using Process Explorer > if you're hitting close to the limit of either of the two pools? See > http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx Another followup. Been working with Dave on and off today (well, him mostly on to be honest, me a bit more on and off), and it seems that both our repros clearly blame the desktop heap, and nothing else. Please use the desktop heap tool and see if it breaks when the desktop heap usage approaches 100%: http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en It'd still be good to know why the difference is so big between your two systems. //Magnus
Magnus Hagander wrote: > Another followup. Been working with Dave on and off today (well, him mostly > on to be honest, me a bit more on and off), and it seems that both our > repros clearly blame the desktop heap, and nothing else. Please use the > desktop heap tool and see if it breaks when the desktop heap usage > approaches 100%: > > http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en > > It'd still be good to know why the difference is so big between your two > systems. Further info on this for the record - on XP Pro (which I'm testing on), the desktop heap size defaults to 512KB for non-interactive sessions and 3072KB for interactive. In testing I find that I can get up to around 46 or so connections when running as a service before desktop heap is exhausted and postgres dies. When running interactively I can get a little over 125 connections before things start dying, however, in that case it's *not* because of dekstop heap, because I can start a second cluster and run 125 connections on each simultaneously. If I run three instances up together, one of them will die as soon as desktop heap gets to 100% usage. So, we seem to be hitting two limits here - the desktop heap, and something else which is cluster-specific. Investigation continues... Regards, Dave
Magnus Hagander schrieb: >On Sun, Oct 21, 2007 at 09:43:27PM +0200, Rainer Bauer wrote: >> Magnus Hagander wrote: >> >> >Trevor Talbot wrote: >> >> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote: >> >> >> >>> Anyway, the problem are the no. of semaphores created by Postgres: >> >>> Every backend creates at least 4*<max_connections> semaphores. Just >> >>> increase <max_connections> to an unusual high value (say 10000) and >> >>> start creating new connections while monitoring the handle count. >> >> >> >> Hmm, they're actually the same semaphores, so the only cost is for >> >> slots in each process's handle table, which comes from kernel paged >> >> pool. Testing shows I can easily create about 30 million handles to a >> >> given object on this machine. This is under win2003 with 1.25GB RAM, >> >> which gives it a paged pool limit of 352MB. >> >> On my system I can only create about 4 millions semaphores. > >Is that 4 million semaphores, or 4 million handles to a smaller number of >semaphores? No, 4 millions distinct semaphores by calling: CreateSemaphore( NULL, 0, 1, NULL ); >> >> I tried going up to 20000 max_connections, and still blew postmaster's >> >> VM space long before paged pool was exhausted. I couldn't test any >> >> higher values, as there's some interaction between max_connections and >> >> shared_buffers that prevents it from mapping the buffer contiguously. >> >> >> >> Something's missing though, since I'm not hitting the same issue you >> >> are. How are you generating the connections? I just have an app >> >> calling PQconnectdb() in a loop, but I guess that's not good enough. >> >> I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to >> establish the test connections. > >Could you try the same tests with the client runnint on a different system? >Since the client eats up a bunch of handles and such as well, and that >would eliminate the difference due to different clients. > >Followup, when running these tests, could you check using Process Explorer >if you're hitting close to the limit of either of the two pools? See >http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx Well after installing Postgres explorer and starting the system information program the kernel memory section shows me the current count, but not the limits (it says "no symbols"). I am currently downloading the "Debugging Tools for Windows". Maybe these limits are shown after the installation. I just repeated the test with a local connection. After 150 connections, the following values are displayed: Paged physical 113000 Paged virtual 120000 Nonpaged 28000 Also there are 1.583.182 handles open. I will check the behaviour with a remote connection later (have to go now...). >> These are my altered settings from the default 8.2.5 Postgres installation: >> ssl = on > >Does it make a difference if you turn this off? > >> shared_buffers = 512MB > >As a general note, thsi is *way* too high. All evidence I've seen points to >that you should have shared_buffers as *small* as possible on win32, >because memory access there is slow. And leave more of the caching up to >the OS. I followed Josh's advice here: <http://archives.postgresql.org/pgsql-performance/2007-06/msg00606.php> What value would you recommend then? The default 32MB? >> These are my altered settings from the default 8.2.5 Postgres installation: >> ssl = on > >Does it make a difference if you turn this off? No. >> >Another thing worth testing - check if the amount of shared memory used >> >makes a noticable difference. Try both very small and very large values. >> >> Well I tried different shared_buffers settings, but the result was consisting: >> with max_connections set to 10000, I can create 150 database connections. > >Ok. But if you decrease max_connections, you can have more connections? Or >the other way around? A few tests indicated, that the maximum no. of connections is 150, regardless of the <max_connections> settings. But I will have to check whether this is somehow caused by the ODBC driver. Rainer
Dave Page wrote: >So, we seem to be hitting two limits here - the desktop heap, and >something else which is cluster-specific. Investigation continues... I will make these tests tonight or tomorrow morning and will let you know. Rainer
On 10/22/07, Rainer Bauer <usenet@munnin.com> wrote: > Well after installing Postgres explorer and starting the system information > program the kernel memory section shows me the current count, but not the > limits (it says "no symbols"). I am currently downloading the "Debugging Tools > for Windows". Maybe these limits are shown after the installation. After you install that, go to Options->Configure Symbols in Process Explorer. Change it to use the dbghelp.dll installed in the Debugging Tools directory, and configure the symbol path like this: http://support.microsoft.com/kb/311503 The limits should show up after that.
On Mon, Oct 22, 2007 at 04:03:35PM +0200, Rainer Bauer wrote: > >> shared_buffers = 512MB > > > >As a general note, thsi is *way* too high. All evidence I've seen points to > >that you should have shared_buffers as *small* as possible on win32, > >because memory access there is slow. And leave more of the caching up to > >the OS. > > I followed Josh's advice here: > <http://archives.postgresql.org/pgsql-performance/2007-06/msg00606.php> > > What value would you recommend then? The default 32MB? That advice is good - for Unix platforms. For windows, yes, try with 32Mb. //Magnus
Dave Page wrote: > So, we seem to be hitting two limits here - the desktop heap, and > something else which is cluster-specific. Investigation continues... In further info, I've been testing this with the 8.3b1 release build that we put out with pgInstaller, and a build with all optional dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing pretty much the same results with each - roughtly 9.6KB of desktop heap used per connection. In addition, I've tried with standard pgbench runs, as well as a script that just does 'select version()'. Again, no differences were observed. Magnus and I did observe that we're using 1 user object and 4 GDI objects per connection. If anyone happens to know how we might identify those, please shout as so far we've drawn a blank :-( Regards, Dave
On 10/21/07, Magnus Hagander <magnus@hagander.net> wrote: > > I tried generating idle connections in an effort to reproduce > > Laurent's problem, but I ran into a local limit instead: for each > > backend, postmaster creates a thread and burns 4MB of its 2GB address > > space. It fails around 490. > > Oh, that's interesting. That's actually a sideeffect of us increasing > the stack size for the postgres.exe executable in order to work on other > things. By default, it burns 1MB/thread, but ours will do 4MB. Never > really thought of the problem that it'll run out of address space. > Unfortunately, that size can't be changed in the CreateThread() call - > only the initially committed size can be changed there. > > There are two ways to get around it - one is not using a thread for each > backend, but a single thread that handles them all and then some sync > objects around it. We originally considered this but said we won't > bother changing it because the current way is simpler, and the overhead > of a thread is tiny compared to a process. I don't think anybody even > thought about the fact that it'd run you out of address space... I'd probably take the approach of combining win32_waitpid() and threads. You'd end up with 1 thread per 64 backends; when something interesting happens the thread could push the info onto a queue, which the new win32_waitpid() would check. Use APCs to add new backends to threads with free slots.
On 10/22/07, Dave Page <dpage@postgresql.org> wrote: > Dave Page wrote: > > So, we seem to be hitting two limits here - the desktop heap, and > > something else which is cluster-specific. Investigation continues... > > In further info, I've been testing this with the 8.3b1 release build > that we put out with pgInstaller, and a build with all optional > dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing > pretty much the same results with each - roughtly 9.6KB of desktop heap > used per connection. The question is where that's coming from. I wondered if it was desktop heap originally, but there's no reason it should be using it, and that seems to be precisely the difference between my system and the others. Connections here are barely making a dent; at 490 there's an entire 45KB committed in the service desktop. > Magnus and I did observe that we're using 1 user object and 4 GDI > objects per connection. If anyone happens to know how we might identify > those, please shout as so far we've drawn a blank :-( Those appear to belong to the console window. I've yet to do anything that generates real load (lightweight system), but a simple "select version()" doesn't make any difference here either, and raising shared buffers just makes postmaster run out of VM space faster. (I don't think I mentioned that error before, but it shows up as "FATAL: could not create sigchld waiter thread: error code 8".)
On Mon, Oct 22, 2007 at 08:04:03AM -0700, Trevor Talbot wrote: > On 10/22/07, Dave Page <dpage@postgresql.org> wrote: > > Dave Page wrote: > > > So, we seem to be hitting two limits here - the desktop heap, and > > > something else which is cluster-specific. Investigation continues... > > > > In further info, I've been testing this with the 8.3b1 release build > > that we put out with pgInstaller, and a build with all optional > > dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing > > pretty much the same results with each - roughtly 9.6KB of desktop heap > > used per connection. > > The question is where that's coming from. I wondered if it was > desktop heap originally, but there's no reason it should be using it, > and that seems to be precisely the difference between my system and > the others. Connections here are barely making a dent; at 490 there's > an entire 45KB committed in the service desktop. Yes, that would be very interesting to know. Because obviouslyi it's something. I read somewhere that Vista makes the size of the desktop heap dynamic, but you were on 2003, right? Are you running the server as a service or from the commandprompt? > > Magnus and I did observe that we're using 1 user object and 4 GDI > > objects per connection. If anyone happens to know how we might identify > > those, please shout as so far we've drawn a blank :-( > > Those appear to belong to the console window. Makes sense - a Windows, a system menu, etc. There's probably a "hidden console window" when running as a service... > I've yet to do anything that generates real load (lightweight system), > but a simple "select version()" doesn't make any difference here > either, and raising shared buffers just makes postmaster run out of VM > space faster. (I don't think I mentioned that error before, but it > shows up as "FATAL: could not create sigchld waiter thread: error > code 8".) Yeah, that makes sense. We need to fix that, but I think that's too big of a change to push during beta, given how few reports we've had of people running into it. //Magnus
On 10/22/07, Magnus Hagander <magnus@hagander.net> wrote: > I read somewhere that Vista makes the size of the desktop heap dynamic, but > you were on 2003, right? Yeah, 32bit 2003 SP2, which has the same limits as XP. It looks like Vista also has the same limits on actual heap sizes, but manages kernel address space dynamically, so it doesn't get stuck with arbitrary limits there. I don't have a Vista machine to verify though. > Are you running the server as a service or from the commandprompt? Service, I've been using the standard MSI install of 8.2.5. > > > Magnus and I did observe that we're using 1 user object and 4 GDI > > > objects per connection. If anyone happens to know how we might identify > > > those, please shout as so far we've drawn a blank :-( > > > > Those appear to belong to the console window. > > Makes sense - a Windows, a system menu, etc. There's probably a "hidden > console window" when running as a service... Well, the only thing actually running as a service is pg_ctl; the other processes just belong to the same desktop. They're all console executables, so they get the usual objects, but they're not visible anywhere. It could be that there's a significant difference between XP and 2003 in how that's handled though. I do have an XP SP2 machine here with 512MB RAM, and I'll try tests on it as soon as I can free up what it's currently occupied with.
Trevor Talbot wrote: > The question is where that's coming from. I wondered if it was > desktop heap originally, but there's no reason it should be using it, > and that seems to be precisely the difference between my system and > the others. Connections here are barely making a dent; at 490 there's > an entire 45KB committed in the service desktop. Hmm, Greg mentioned to me earlier that he was suspicious of SSPI which seems to drag in dependencies on gdi32.dll and user32.dll via secur32.dll. Sure enough, testing with 8.2.5 on XP Pro, I get to 150 connections running as a service having used 97.2 of desktop heap (vs. 45 connections max with 8.3). So we have a pretty serious regression in 8.3. Of course, that still doesn't tally up with what you're seeing on Win2k3. I'll test on there tomorrow. Regards, Dave
I wrote: [ desktop heap usage ] > It could be that there's a significant difference between XP and 2003 > in how that's handled though. I do have an XP SP2 machine here with > 512MB RAM, and I'll try tests on it as soon as I can free up what it's > currently occupied with. ...yep, under XP I'm using about 3.1KB of the service heap per connection, which tears through it quite a bit faster. Now to figure out exactly where it's coming from...
Trevor Talbot wrote: > I wrote: > > [ desktop heap usage ] > >> It could be that there's a significant difference between XP and 2003 >> in how that's handled though. I do have an XP SP2 machine here with >> 512MB RAM, and I'll try tests on it as soon as I can free up what it's >> currently occupied with. > > ...yep, under XP I'm using about 3.1KB of the service heap per > connection, which tears through it quite a bit faster. Now to figure > out exactly where it's coming from... That ties up with what I'm seeing - on 8.3 it's about 9.6KB per connection, and I get a little under a third as many connections as 8.2 before it dies. /D
Dave Page wrote: > Trevor Talbot wrote: >> The question is where that's coming from. I wondered if it was >> desktop heap originally, but there's no reason it should be using it, >> and that seems to be precisely the difference between my system and >> the others. Connections here are barely making a dent; at 490 there's >> an entire 45KB committed in the service desktop. > > Hmm, Greg mentioned to me earlier that he was suspicious of SSPI which > seems to drag in dependencies on gdi32.dll and user32.dll via > secur32.dll. Sure enough, testing with 8.2.5 on XP Pro, I get to 150 > connections running as a service having used 97.2 of desktop heap (vs. > 45 connections max with 8.3). > > So we have a pretty serious regression in 8.3. > > Of course, that still doesn't tally up with what you're seeing on > Win2k3. I'll test on there tomorrow. Could you try a build without SSPI? It should be as simple as removing the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to touch the linker lines at all, actually, so try without first. //Magnus
Florian Weimer wrote: > * Magnus Hagander: > >> Oh, that's interesting. That's actually a sideeffect of us increasing >> the stack size for the postgres.exe executable in order to work on other >> things. By default, it burns 1MB/thread, but ours will do 4MB. Never >> really thought of the problem that it'll run out of address space. >> Unfortunately, that size can't be changed in the CreateThread() call - >> only the initially committed size can be changed there. > > Windows XP supports the STACK_SIZE_PARAM_IS_A_RESERVATION flag, which > apparently allows to reduce the reserved size. It might be better to do > this the other way round, though (leave the reservation at its 1 MB > default, and increase it only when necessary). It does, but we still support windows 2000 as well. I think it's better to use a different method altogether - one not using one thread per child. //Magnus
* Magnus Hagander: > Oh, that's interesting. That's actually a sideeffect of us increasing > the stack size for the postgres.exe executable in order to work on other > things. By default, it burns 1MB/thread, but ours will do 4MB. Never > really thought of the problem that it'll run out of address space. > Unfortunately, that size can't be changed in the CreateThread() call - > only the initially committed size can be changed there. Windows XP supports the STACK_SIZE_PARAM_IS_A_RESERVATION flag, which apparently allows to reduce the reserved size. It might be better to do this the other way round, though (leave the reservation at its 1 MB default, and increase it only when necessary).
Trevor Talbot wrote: > On 10/21/07, Magnus Hagander <magnus@hagander.net> wrote: > >>> I tried generating idle connections in an effort to reproduce >>> Laurent's problem, but I ran into a local limit instead: for each >>> backend, postmaster creates a thread and burns 4MB of its 2GB address >>> space. It fails around 490. >> Oh, that's interesting. That's actually a sideeffect of us increasing >> the stack size for the postgres.exe executable in order to work on other >> things. By default, it burns 1MB/thread, but ours will do 4MB. Never >> really thought of the problem that it'll run out of address space. >> Unfortunately, that size can't be changed in the CreateThread() call - >> only the initially committed size can be changed there. >> >> There are two ways to get around it - one is not using a thread for each >> backend, but a single thread that handles them all and then some sync >> objects around it. We originally considered this but said we won't >> bother changing it because the current way is simpler, and the overhead >> of a thread is tiny compared to a process. I don't think anybody even >> thought about the fact that it'd run you out of address space... > > I'd probably take the approach of combining win32_waitpid() and > threads. You'd end up with 1 thread per 64 backends; when something > interesting happens the thread could push the info onto a queue, which > the new win32_waitpid() would check. Use APCs to add new backends to > threads with free slots. I was planning to make it even easier and let Windows do the job for us, just using RegisterWaitForSingleObject(). Does the same - one thread per 64 backends, but we don't have to deal with the queueing ourselves. Should be rather trivial to do. Keeps win32_waitpid() unchanged. That said, refactoring win32_waitpid() to be based on a queue might be a good idea *anyway*. Have the callback from above put something in the queue, and go with your idea for the rest. //Magnus
Magnus Hagander wrote: > Could you try a build without SSPI? It should be as simple as removing > the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to > touch the linker lines at all, actually, so try without first. Nope, doesn't help - still using around 9.7KB per connection. Just to be sure I did remove the link option, and checking with depends see that there are now only delay load references to secur32.dll, nothing direct - as is the case with 8.2 So the only other changes I can think of that might affect things are the VC++ build or the shared memory changes, though I can't see why they would cause problems offhand. I'll go try a mingw build... /D
Dave Page wrote: > Magnus Hagander wrote: >> Could you try a build without SSPI? It should be as simple as removing >> the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to >> touch the linker lines at all, actually, so try without first. > > Nope, doesn't help - still using around 9.7KB per connection. Just to be > sure I did remove the link option, and checking with depends see that > there are now only delay load references to secur32.dll, nothing direct > - as is the case with 8.2 ok. That makes sense, actually... > So the only other changes I can think of that might affect things are > the VC++ build or the shared memory changes, though I can't see why they > would cause problems offhand. I'll go try a mingw build... Yeah, it could be that the newer MSVCRT files do something we don't like.. Other than that, did we upgrade to a different version of some of our dependents? Also, is this the DEBUG or RELEASE build of 8.3? //Magnus
On 10/22/07, Magnus Hagander <magnus@hagander.net> wrote: > Trevor Talbot wrote: > > I'd probably take the approach of combining win32_waitpid() and > > threads. You'd end up with 1 thread per 64 backends; when something > > interesting happens the thread could push the info onto a queue, which > > the new win32_waitpid() would check. Use APCs to add new backends to > > threads with free slots. > > I was planning to make it even easier and let Windows do the job for us, > just using RegisterWaitForSingleObject(). Does the same - one thread per > 64 backends, but we don't have to deal with the queueing ourselves. Oh, good call -- I keep forgetting the native thread pool exists.
Dave Page wrote: > So the only other changes I can think of that might affect things are > the VC++ build or the shared memory changes, though I can't see why they > would cause problems offhand. I'll go try a mingw build... mingw build of stock 8.3b1, no configure options specified at all, consumes 3.2KB of desktop heap per connection. So, it's either something we're doing different with the VC++ compile/link options, or it's the VC8 runtimes using more resources. Oh, and I still see the second limitation where it bombs out over about 125 connections, so that isn't build/runtime specific. Shall we take this over to -hackers btw? /D
Magnus Hagander wrote: > Yeah, it could be that the newer MSVCRT files do something we don't > like.. Other than that, did we upgrade to a different version of some of > our dependents? Most of them - but my test build is without any of them: our $config = { asserts=>1, # --enable-cassert integer_datetimes=>1, # --enable-integer-datetimes nls=>undef, # --enable-nls=<path> tcl=>undef, # --with-tls=<path> perl=>undef, # --with-perl python=>undef, # --with-python=<path> krb5=>undef, # --with-krb5=<path> ldap=>0, # --with-ldap openssl=>undef, # --with-ssl=<path> xml=>undef, xslt=>undef, iconv=>undef, zlib=>undef # --with-zlib=<path> }; > Also, is this the DEBUG or RELEASE build of 8.3? Both behave similarly. /D
Magnus Hagander <magnus@hagander.net> writes: > I was planning to make it even easier and let Windows do the job for us, > just using RegisterWaitForSingleObject(). Does the same - one thread per > 64 backends, but we don't have to deal with the queueing ourselves. > Should be rather trivial to do. How can that possibly work? Backends have to be able to run concurrently, and I don't see how they'll do that if they share a stack. regards, tom lane
On 10/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: > > I was planning to make it even easier and let Windows do the job for us, > > just using RegisterWaitForSingleObject(). Does the same - one thread per > > 64 backends, but we don't have to deal with the queueing ourselves. > > Should be rather trivial to do. > > How can that possibly work? Backends have to be able to run > concurrently, and I don't see how they'll do that if they share a stack. This is about what postmaster does for its SIGCHLD wait equivalent on win32. The 64 comes from Windows' object/event mechanism, which lets you perform a blocking wait on up to that many handles in a single call. Currently postmaster is creating a new thread to wait on only one backend at a time, so it ends up with too many threads.
Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> I was planning to make it even easier and let Windows do the job for us, >> just using RegisterWaitForSingleObject(). Does the same - one thread per >> 64 backends, but we don't have to deal with the queueing ourselves. >> Should be rather trivial to do. > > How can that possibly work? Backends have to be able to run > concurrently, and I don't see how they'll do that if they share a stack. We're not talking about the backends, we're talking about the backend waiter threads whose sole purpose is to wait for a backend to die and then raise a signal when it does. We can easily have the kernel wait for a whole bunch of them at once, and have it call our callback function whenever anyone of them dies. //Magnus
Magnus Hagander <magnus@hagander.net> writes: > We're not talking about the backends, we're talking about the backend > waiter threads whose sole purpose is to wait for a backend to die and > then raise a signal when it does. Oh, OK, I had not twigged to exactly what the threads were being used for. Never mind ... regards, tom lane
"Trevor Talbot" wrote: >I wrote: > >[ desktop heap usage ] > >> It could be that there's a significant difference between XP and 2003 >> in how that's handled though. I do have an XP SP2 machine here with >> 512MB RAM, and I'll try tests on it as soon as I can free up what it's >> currently occupied with. > >...yep, under XP I'm using about 3.1KB of the service heap per >connection, which tears through it quite a bit faster. Now to figure >out exactly where it's coming from... I can confirm this here (WinXP SP2). I have restored the original postgresql.conf file that was created when the cluster was initialized with Postgres 8.2.4-1 (the installed version now is 8.2.5-1). The only other change to this installation is that I have moved the WAL directory pg_xlog to another drive using a junction link. Here are my numbers from SysInternals System Information program: Pages Limit: 364544KB [356MB] Nonpaged Limit: 262144KB [256MB] These limits are never reached. Using the Desktop Heap Monitor every new connection consumes 3232 bytes of the total 512KB heap. >It could be that there's a significant difference between XP and 2003 >in how that's handled though. I do have an XP SP2 machine here with >512MB RAM, and I'll try tests on it as soon as I can free up what it's >currently occupied with. Yeah, Win2003 behaves differently accoriding to this source: <http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx> <quote> Session paged pool allows session specific paged pool allocations. Windows XP uses regular paged pool, since the number of remote desktop connections is limited. On the other hand, Windows Server 2003 makes allocations from session paged pool instead of regular paged pool if Terminal Services (application server mode) is installed. </quote> After increasing the session heap size in the registry from 512KB to 1024KB the no. of connections was roughly doubled. So this might be a solution for people running out of Desktop heap. Alter the value of the following key <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> The numeric values following "SharedSection=" control the heap management: On WinXP these are the default values: "SharedSection=1024,3072,512" Altering this to "SharedSection=1024,3072,1024" will increase the heap for all non-interactive window stations to 1024KB. Rainer
Rainer Bauer wrote: >> ...yep, under XP I'm using about 3.1KB of the service heap per >> connection, which tears through it quite a bit faster. Now to figure >> out exactly where it's coming from... > > I can confirm this here (WinXP SP2). It's coming from direct dependencies on user32.dll (from which we use wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath()) and is allocated when ResumeThread() is called to kickstart the new backend, but before the backend actually does anything (proven with a while(1) loop in main() for the -forkbackend case with a breakpoint on ResumeThread() in the postmaster). I've submitted a patch against 8.3 that removes these dependencies altogether. Unfortuntely, it seems we still have indirect dependencies on user32.dll which I don't believe we can do anything about. In testing, the patch reduces the per-connection desktop heap usage from arount 9.7KB to 3.2KB which is back in line with 8.2. Regards, Dave
Dave, > It's coming from direct dependencies on user32.dll (from which we use > wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath()) > and is allocated when ResumeThread() is called to kickstart the new > backend, why does every backend need its own heap for user32.dll or shell32.dll? Wasn't the point of shared dlls to be shared? Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 fx 01212-5-13695179 - EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!
"Harald Armin Massa" <haraldarminmassa@gmail.com> writes: > Dave, > >> It's coming from direct dependencies on user32.dll (from which we use >> wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath()) >> and is allocated when ResumeThread() is called to kickstart the new >> backend, > > why does every backend need its own heap for user32.dll or > shell32.dll? Wasn't the point of shared dlls to be shared? The Desktop Heap appears to be a place for processes belonging to the same "desktop" to allocate shared objects such as GUI elements. These are allocated in shared space so they can be manipulated by any process running in that "desktop". Why Shell32 and User32 are allocating space in there just to initialize themselves or handle these basic utility functions is a bit of a mystery. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Harald Armin Massa wrote: > Dave, > >> It's coming from direct dependencies on user32.dll (from which we use >> wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath()) >> and is allocated when ResumeThread() is called to kickstart the new >> backend, > > why does every backend need its own heap for user32.dll or > shell32.dll? Wasn't the point of shared dlls to be shared? No idea, and I thought so. It's quite easy to prove using the test program attached. Just monitor the desktop heap with dheapmon (from Microsoft's website), and run the program with a single command line argument to get it to spawn a 100 child processes. You can stop it loading various DLLs by commenting out the dummy calls to functions in them and rebuilding. Of course, none of this would be an issue if we made the backend multithreaded. :-) I'll get my coat... /D #include <stdio.h> #include <Windows.h> #include <winsock.h> #define SECURITY_WIN32 #include <Security.h> #include <shlobj.h> int main(int argc, char *argv[]) { // Dummy functions to force linking to specific libs // user32.lib IsCharAlpha('a'); // wsock32.lib WSADATA wsaData; WSAStartup(MAKEWORD(1, 1), &wsaData); // secur32.lib char un[30]; DWORD dwUNLen = 30; GetUserNameExA(NameUserPrincipal, un, &dwUNLen); // advapi32.dll char un2[30]; DWORD dwUN2Len = 30; GetUserNameA(un2, &dwUN2Len); // shell32.dll IsUserAnAdmin(); // Used by child processes if (argc == 1) { while (1) { printf("Foo\n"); Sleep(2000); } } else { for (int x=0; x<100; x++) { STARTUPINFOA si; PROCESS_INFORMATION pi; memset(&pi, 0, sizeof(pi)); memset(&si, 0, sizeof(si)); si.cb = sizeof(si); /* * Create the subprocess in a suspended state. This will be resumed later, * once we have written out the parameter file. */ printf("Creating process %d...\n", x); if (!CreateProcessA(NULL, argv[0], NULL, NULL, TRUE, CREATE_SUSPENDED, NULL, NULL, &si, &pi)) { printf("CreateProcess call failed: %m (error code %d)", (int) GetLastError()); return -1; } printf("Resuming thread %d...\n", x); if (ResumeThread(pi.hThread) == -1) { if (!TerminateProcess(pi.hProcess, 255)) { printf("could not terminate unstartable process: error code %d", (int) GetLastError()); CloseHandle(pi.hProcess); CloseHandle(pi.hThread); return -1; } CloseHandle(pi.hProcess); CloseHandle(pi.hThread); printf("could not resume thread of unstarted process: error code %d", (int) GetLastError()); return -1; } } } return 0; }
> > why does every backend need its own heap for user32.dll or > > shell32.dll? Wasn't the point of shared dlls to be shared? > > The Desktop Heap appears to be a place for processes belonging to the same > "desktop" to allocate shared objects such as GUI elements. These are allocated > in shared space so they can be manipulated by any process running in that > "desktop". Using this knowledge and Daves response, also looking back at "3,2kb per backend", I stumbled upon that KB entry: http://support.microsoft.com/?scid=kb%3Ben-us%3B184802&x=15&y=14 Please pay special attention to the following parts: %SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=16 """The second SharedSection value (3072) is the size of the desktop heap for each desktop that is associated with the "interactive" window station WinSta0.""" and further down: """All services that are executed under the LocalSystem account with the Allow Service to Interact with Desktop startup option selected will use "Winsta0\Default". All these processes will share the same desktop heap associated with the "Default" application desktop.""" Postgres is definitely NOT started as LocalSystem account; so using a "logical not" on Microsofts Words that could indicate the reason why our service-backends consume that memory? Add to this that MS SQL runs as LocalSystem; and as much as I know also Oracle. Is this a path of thinking to try? Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 fx 01212-5-13695179 - EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!
Replying to myself.... > Postgres is definitely NOT started as LocalSystem account; so using a > "logical not" on Microsofts Words that could indicate the reason why > our service-backends consume that memory? Add to this that MS SQL runs > as LocalSystem; and as much as I know also Oracle. just some lines further down: """Every service process executed under a user account will receive a new desktop in a noninteractive window station created by the Service Control Manager (SCM). Thus, each service executed under a user account will consume the number of kilobytes of desktop heap specified in the third SharedSection value. All services executed under the LocalSystem account with Allow Service to Interact with the Desktop not selected share the desktop heap of the "Default" desktop in the noninteractive service windows station (Service-0x0-3e7$).""" it is exactly as suspected ... just starting the service allocates that heap Harald -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 fx 01212-5-13695179 - EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!
Harald Armin Massa wrote: > Replying to myself.... >> Postgres is definitely NOT started as LocalSystem account; so using a >> "logical not" on Microsofts Words that could indicate the reason why >> our service-backends consume that memory? Add to this that MS SQL runs >> as LocalSystem; and as much as I know also Oracle. > > just some lines further down: > > """Every service process executed under a user account will receive a > new desktop in a noninteractive window station created by the Service > Control Manager (SCM). Thus, each service executed under a user > account will consume the number of kilobytes of desktop heap specified > in the third SharedSection value. All services executed under the > LocalSystem account with Allow Service to Interact with the Desktop > not selected share the desktop heap of the "Default" desktop in the > noninteractive service windows station (Service-0x0-3e7$).""" > > > it is exactly as suspected ... just starting the service allocates that heap You're missing the point I think. There's 48MB (iirc) on XP that is reserved for desktop heaps. From that, it allocates 64KB for WinSta0\Disconnect, 128KB for WinSta0\Winlogon and 3072KB for WinSta0\Default (ie. the regular desktop). Each additional session started by the SCM gets allocated the non-interactive default of 512KB. It's not the 48MB we're running out of, it's the 512KB. That's why if you look back in the thread, you'll see I found 8.3 was crashing with 46 connections when running as a service, but with much higher numbers of connections when run from the logon session. The reason why Microsoft services don't consume so much heap is that they are multi-threaded, not multi-process so they don't init user32.dll etc. for each individual connection like we do, but only once for the whole server. /D
On 10/23/07, Harald Armin Massa <haraldarminmassa@gmail.com> wrote: > > The Desktop Heap appears to be a place for processes belonging to the same > > "desktop" to allocate shared objects such as GUI elements. These are allocated > > in shared space so they can be manipulated by any process running in that > > "desktop". > > Using this knowledge and Daves response, also looking back at "3,2kb > per backend", I stumbled upon that KB entry: > > http://support.microsoft.com/?scid=kb%3Ben-us%3B184802&x=15&y=14 [...] > Postgres is definitely NOT started as LocalSystem account; so using a > "logical not" on Microsofts Words that could indicate the reason why > our service-backends consume that memory? Add to this that MS SQL runs > as LocalSystem; and as much as I know also Oracle. It's not quite what you think. The link Rainer posted upthread does a decent job describing it, although there's still some room for confusion: http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx The hierarchy of containers goes Session, Window Station, Desktop. Everything relevant is under the same Session, so I'll ignore that for now. The console gets a Window Station; this is the interactive one since the user sitting down works with it directly. It normally contains one Desktop of interest (Default), which is what the user actually sees. (It's possible to create multiple desktops as a framework for a "virtual desktop" type of thing, but that's all third-party stuff.) Each service registered with the Service Manager has a specific account it logs in under. For each account, the Service Manager creates a Window Station to contain it, and all services using the same account share the default Desktop inside it. Most services run under one of the 3 canned accounts, which is what that KB article is talking about with the Local System bit. Each Desktop created has a fixed-size chunk of memory allocated to it. Desktops created under the interactive Window Station get the larger chunk of memory (3072KB) since they expect to contain lots of UI stuff. Desktops created under other Window Stations get the smaller chunk of memory (512KB), since they aren't presenting a UI to the user. That fixed-size desktop heap is used to track objects handled by the USER subsystem, which is mostly UI elements like windows and such. Most of the API interaction for those resources go through user32.dll, and apparently its initialization procedure grabs some of that heap space for each process it's loaded into. The PostgreSQL service is set to log in under its own account, so it gets its own Window Station, and a default Desktop inside that. This is a non-interactive Window Station, so the Desktop gets the smaller heap. All postgres.exe processes run in that Desktop and share one 512KB heap. As each process ends up carving out a chunk of that space, it uses up all 512KB and fails to create more backends.
On 10/23/07, Rainer Bauer <usenet@munnin.com> wrote: > "Trevor Talbot" wrote: > >It could be that there's a significant difference between XP and 2003 > >in how that's handled though. I do have an XP SP2 machine here with > >512MB RAM, and I'll try tests on it as soon as I can free up what it's > >currently occupied with. > > Yeah, Win2003 behaves differently accoriding to this source: > <http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx> > > <quote> > Session paged pool allows session specific paged pool allocations. Windows XP > uses regular paged pool, since the number of remote desktop connections is > limited. On the other hand, Windows Server 2003 makes allocations from > session paged pool instead of regular paged pool if Terminal Services > (application server mode) is installed. > </quote> That's a little different. There's a specific range of kernel VM space dedicated to session-specific data, so each session references the same addresses but it can be backed by different physical memory (same concept as separate processes). The session paged pool area of that VM space is used to allocate the individual desktop heaps from. It's saying that under XP, it's mapped to the main kernel paged pool, while under 2003 TS it's mapped to session-specific memory, to avoid depleting the main paged pool. (Each Terminal Services connection creates an entire Session.) It doesn't change how desktop heap is actually used though, which is the issue we're running into. The system I'm testing on doesn't have Terminal Services running in appserver mode. > After increasing the session heap size in the registry from 512KB to 1024KB > the no. of connections was roughly doubled. So this might be a solution for > people running out of Desktop heap. > > Alter the value of the following key > <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> > > The numeric values following "SharedSection=" control the heap management: > On WinXP these are the default values: "SharedSection=1024,3072,512" > Altering this to "SharedSection=1024,3072,1024" will increase the heap for all > non-interactive window stations to 1024KB. It's probably safe to do on a typical XP box, but it's unfortunately not something you want the installer to do, or even suggest as blanket advice. I also wondered about having postmaster create more desktops on demand, but that has about the same amount of sanity (i.e. not much). I think it boils down to getting postgres to avoid using desktop heap if at all possible, and if not, advising people to avoid XP for high concurrency, except for suggesting the above change in specific circumstances. I suspect win2000 has the same issue, but I don't have a system to test. It'd be interesting to know if 2000 Professional behaves any differently than Server.
Rainer Bauer wrote: > After increasing the session heap size in the registry from 512KB to 1024KB > the no. of connections was roughly doubled. So this might be a solution for > people running out of Desktop heap. > > Alter the value of the following key > <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> > > The numeric values following "SharedSection=" control the heap management: > On WinXP these are the default values: "SharedSection=1024,3072,512" > Altering this to "SharedSection=1024,3072,1024" will increase the heap for all > non-interactive window stations to 1024KB. This part should go in the FAQ, I think. It's valid for 8.2 as well, from what I can tell, and it's valid for 8.3 both before and after the patch I just applied. Dave, you're listed as maintainer :-P //Magnus
Magnus Hagander wrote: > Rainer Bauer wrote: >> After increasing the session heap size in the registry from 512KB to 1024KB >> the no. of connections was roughly doubled. So this might be a solution for >> people running out of Desktop heap. >> >> Alter the value of the following key >> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> >> >> The numeric values following "SharedSection=" control the heap management: >> On WinXP these are the default values: "SharedSection=1024,3072,512" >> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all >> non-interactive window stations to 1024KB. > > This part should go in the FAQ, I think. It's valid for 8.2 as well, > from what I can tell, and it's valid for 8.3 both before and after the > patch I just applied. > > Dave, you're listed as maintainer :-P done. /D
Dave Page wrote: >Magnus Hagander wrote: >> Rainer Bauer wrote: >>> After increasing the session heap size in the registry from 512KB to 1024KB >>> the no. of connections was roughly doubled. So this might be a solution for >>> people running out of Desktop heap. >>> >>> Alter the value of the following key >>> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> >>> >>> The numeric values following "SharedSection=" control the heap management: >>> On WinXP these are the default values: "SharedSection=1024,3072,512" >>> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all >>> non-interactive window stations to 1024KB. >> >> This part should go in the FAQ, I think. It's valid for 8.2 as well, >> from what I can tell, and it's valid for 8.3 both before and after the >> patch I just applied. >> >> Dave, you're listed as maintainer :-P > >done. Dave could you add that it's the third parameter of the "SharedSection" string that must be changed. I read that KB article, but still had to find the correct one by trial and error, which required a reboot every time. Rainer
> ------- Original Message ------- > From: Rainer Bauer <usenet@munnin.com> > To: pgsql-general@postgresql.org > Sent: 26/10/07, 18:09:26 > Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit > > Dave could you add that it's the third parameter of the "SharedSection" string > that must be changed. I read that KB article, but still had to find the > correct one by trial and error, which required a reboot every time. Err, it does say that: You can increase the non-interactive Desktop Heap by modifying the third SharedSection value in the registry as describedin this Microsoft Knowledgebase article. /D
"Dave Page" wrote: >> ------- Original Message ------- >> From: Rainer Bauer <usenet@munnin.com> >> To: pgsql-general@postgresql.org >> Sent: 26/10/07, 18:09:26 >> Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit >> >> Dave could you add that it's the third parameter of the "SharedSection" string >> that must be changed. I read that KB article, but still had to find the >> correct one by trial and error, which required a reboot every time. > >Err, it does say that: > >You can increase the non-interactive Desktop Heap by modifying the third SharedSection value in the registry as describedin this Microsoft Knowledgebase article. Must have overlooked that part. Sorry for the noise. Rainer
Hi, I'm not sure if this is good netiquette, or not. I'm reviving a month-old thread, because I'm trying to figure out how to resolve the issue. To summarize: when I run unit tests with eclipse (and with Ant) on Windows, at some point, I run out of available connections. I tried increasing the maximum number of connections, but then I started seeing the postgres server die and restart. I'm trying to fix this, yet again, but I don't have a clear idea of what to fix. On Tue, 23 Oct 2007 20:07:22 +0200, Magnus Hagander wrote: > Rainer Bauer wrote: >> After increasing the session heap size in the registry from 512KB to 1024KB >> the no. of connections was roughly doubled. So this might be a solution for >> people running out of Desktop heap. >> >> Alter the value of the following key >> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> >> >> The numeric values following "SharedSection=" control the heap management: >> On WinXP these are the default values: "SharedSection=1024,3072,512" >> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all >> non-interactive window stations to 1024KB. > Does this allow creating more connections? At some point, the discussion became too technical for me, and I no longer could tell if the answer was for developers of for users. I saw other messages dealing with semaphores/connection relations, etc. But unless I really did not understand the discussion, none of them seemed to address the issue I was seeing. I'm thinking that the Java driver combined with Hibernate may be keeping handles open for too long, because my tests aren't supposed to maintain connections open for very long. I also would expect the connections to either be closed or released once the statements are executed. > This part should go in the FAQ, I think. It's valid for 8.2 as well, > from what I can tell, and it's valid for 8.3 both before and after the > patch I just applied. > > Dave, you're listed as maintainer :-P > > //Magnus > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ -- Prenez la parole en public en étant Speak to an audience while being moins nerveux et plus convaincant! less nervous and more convincing! Abonnez-vous au bulletin gratuit! Sign up for the free newsletter! http://www.duperval.com (514) 902-0186
On 11/29/07, Laurent Duperval <lduperval@yahoo.com> wrote: > To summarize: when I run unit tests with eclipse (and with Ant) on > Windows, at some point, I run out of available connections. I tried > increasing the maximum number of connections, but then I started seeing > the postgres server die and restart. The conclusion was that under Windows XP, postgres is normally limited to a maximum of 125-150 connections. Raising max_connections higher than that will lead to the crashes you saw. > > Rainer Bauer wrote: > >> After increasing the session heap size in the registry from 512KB to 1024KB > >> the no. of connections was roughly doubled. So this might be a solution for > >> people running out of Desktop heap. > >> > >> Alter the value of the following key > >> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows> > >> > >> The numeric values following "SharedSection=" control the heap management: > >> On WinXP these are the default values: "SharedSection=1024,3072,512" > >> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all > >> non-interactive window stations to 1024KB. > Does this allow creating more connections? At some point, the discussion > became too technical for me, and I no longer could tell if the answer was > for developers of for users. Yes. After making that change and restarting Windows, postgres will be able to safely handle 250-300 connections. > I saw other messages dealing with semaphores/connection relations, etc. > But unless I really did not understand the discussion, none of them seemed > to address the issue I was seeing. Right, we were just trying to find the precise resource limit that was causing the crash. > I'm thinking that the Java driver combined with Hibernate may be keeping > handles open for too long, because my tests aren't supposed to maintain > connections open for very long. I also would expect the connections to > either be closed or released once the statements are executed. This is where I would start on your problem. Increasing the max connections is one thing, but having so very many simultaneous operations in progress on your database is probably not productive, as it's likely to spend more time juggling tasks than actually performing them. I'm not familiar with Java tools, so someone else will have to chime in with specific suggestions. It may be something as simple as limiting how many tests JUnit/Ant tries to run at the same time, or some parameter buried in Hibernate or the driver.
Laurent Duperval wrote: > Does this allow creating more connections? At some point, the discussion > became too technical for me, and I no longer could tell if the answer was > for developers of for users. Yeah, it did become something of an investigation into the problem which probably should have been moved to -hackers. I summarised the info in the FAQ http://www.postgresql.org/docs/faqs.FAQ_windows.html#4.4 for user consumption, and included a link to the MS Knowledgebase article that shows what to tweak in the registry. > I saw other messages dealing with semaphores/connection relations, etc. > But unless I really did not understand the discussion, none of them seemed > to address the issue I was seeing. Yes, that was all about how we were using threads to manage interprocess communications. We found a far more efficient way to do that, but my guess is that thats not your problem. > I'm thinking that the Java driver combined with Hibernate may be keeping > handles open for too long, because my tests aren't supposed to maintain > connections open for very long. I also would expect the connections to > either be closed or released once the statements are executed. That could be an issue with Hibernate or the other code you're running, but yes, if it's opening lots of connections and keeping them open that could be what's wrong and I would suggest checking the FAQ above. Regards, Dave