Thread: 8.2.3: Server crashes on Windows using Eclipse/Junit

8.2.3: Server crashes on Windows using Eclipse/Junit

From

Laurent Duperval

Date:

15 October 2007, 15:13:46

Hi,

I have a large amount of tests I run in Eclipse to test my application.
Many of them create and delete a lot of information in PG and at some
point, PG will crash and restart.

I get en error in the logs that state:

Server process exited with exit code -1073741502
 .
 .
 .
Terminating connection because of crash of another server process

If it helps:

- I am using Windows XP
- I have 2 GB of memory
- I am using JPA/Hibernate3 and the Postgres Java driver

Any ideas?

L

--
Prenez la parole en public en étant         Speak to an audience while being
moins nerveux et plus convaincant!         less nervous and more convincing!
Éveillez l'orateur en vous!                    Bring out the speaker in you!

Information: laurent@duperval.com   http://www.duperval.com   (514) 902-0186

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Kris Jurka

Date:

15 October 2007, 16:06:44

On Mon, 15 Oct 2007, Laurent Duperval wrote:

> I have a large amount of tests I run in Eclipse to test my application.
> Many of them create and delete a lot of information in PG and at some
> point, PG will crash and restart.
>
> I get en error in the logs that state:
>
> Server process exited with exit code -1073741502

This is likely a server bug.  If you can isolate the failing test and
extract a self contained example someone can probably fix it.

Kris Jurka

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Laurent Duperval

Date:

16 October 2007, 15:13:58

On Mon, 15 Oct 2007 15:06:37 -0400, Kris Jurka wrote:

>> I get en error in the logs that state:
>>
>> Server process exited with exit code -1073741502
>
> This is likely a server bug.  If you can isolate the failing test and
> extract a self contained example someone can probably fix it.
>

It seems to be some sort of interaction between Eclipse and Junit/Postgres
driver. When I run my tests, just before the server crash, I have dozens
and dozens of spawned Postgres processes. When the crash occurs, all
processes are killed and restarted again. And this process continues until
the tests complete.

When I run the tests from an ant script I also see some spawned processes,
but nothing like running it in Eclipse.

If I run each test case separately, I don't see this issue. But when I run
them as a whole (i.e. run all tests defined in my application) I get the
same error every time.

L

> Kris Jurka
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Prenez la parole en public en étant         Speak to an audience while being
moins nerveux et plus convaincant!         less nervous and more convincing!
Éveillez l'orateur en vous!                    Bring out the speaker in you!

Information: laurent@duperval.com   http://www.duperval.com   (514) 902-0186

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Laurent Duperval

Date:

16 October 2007, 15:34:58

I will add that speed may be a factor also. When I increase the amount of
logging by the PG server, I see the problem less often.

L


On Mon, 15 Oct 2007 17:58:48 +0000, Laurent Duperval wrote:

> Hi,
>
> I have a large amount of tests I run in Eclipse to test my application.
> Many of them create and delete a lot of information in PG and at some
> point, PG will crash and restart.
>
> I get en error in the logs that state:
>
> Server process exited with exit code -1073741502
>  .
>  .
>  .
> Terminating connection because of crash of another server process
>
> If it helps:
>
> - I am using Windows XP
> - I have 2 GB of memory
> - I am using JPA/Hibernate3 and the Postgres Java driver
>
> Any ideas?
>
> L
>



--
Prenez la parole en public en étant         Speak to an audience while being
moins nerveux et plus convaincant!         less nervous and more convincing!
Éveillez l'orateur en vous!                    Bring out the speaker in you!

Information: laurent@duperval.com   http://www.duperval.com   (514) 902-0186

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Alvaro Herrera

Date:

16 October 2007, 19:20:06

Laurent Duperval wrote:
> On Mon, 15 Oct 2007 15:06:37 -0400, Kris Jurka wrote:
>
> >> I get en error in the logs that state:
> >>
> >> Server process exited with exit code -1073741502
> >
> > This is likely a server bug.  If you can isolate the failing test and
> > extract a self contained example someone can probably fix it.
>
> It seems to be some sort of interaction between Eclipse and Junit/Postgres
> driver. When I run my tests, just before the server crash, I have dozens
> and dozens of spawned Postgres processes. When the crash occurs, all
> processes are killed and restarted again. And this process continues until
> the tests complete.

The fact that all Postgres processes disappear is normal.  Postgres
itself (more precisely, the postmaster process) kills all server
processes when one of them dies unexpectedly.

> When I run the tests from an ant script I also see some spawned processes,
> but nothing like running it in Eclipse.
>
> If I run each test case separately, I don't see this issue. But when I run
> them as a whole (i.e. run all tests defined in my application) I get the
> same error every time.

Maybe Eclipse is trying to run more of them at a time than ant, and the
extra concurrency is killing the server for some reason.  Was this
compiled with Cygwin, or is it the native (mingw) version?

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/DXLWNGRJD34J
"La vida es para el que se aventura"

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Tom Lane

Date:

16 October 2007, 20:01:17

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Maybe Eclipse is trying to run more of them at a time than ant, and the
> extra concurrency is killing the server for some reason.  Was this
> compiled with Cygwin, or is it the native (mingw) version?

Don't both those builds have some hard-wired upper limit on the number of
child processes?  I wonder what Laurent has max_connections set to...
if it's larger than the build could actually support, perhaps this
behavior would be the result.

            regards, tom lane

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

16 October 2007, 20:02:38

On 10/16/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> Laurent Duperval wrote:

> > >> I get en error in the logs that state:
> > >>
> > >> Server process exited with exit code -1073741502

FYI, this exit code means a DLL's initialization routine indicated
failure during process startup.

> > If I run each test case separately, I don't see this issue. But when I run
> > them as a whole (i.e. run all tests defined in my application) I get the
> > same error every time.
>
> Maybe Eclipse is trying to run more of them at a time than ant, and the
> extra concurrency is killing the server for some reason.

Sounds likely.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Laurent Duperval

Date:

16 October 2007, 23:22:01

Hi,

Sorry for top-posting but since I am answering questions that don't all
appear in this message:

- I installed the default download of Postgres. I didn't compile myself,
so it's probably the mingw version

- Max_connections is set to 500. I did that originally because I kept
seeing a message about no connection available and I thought it was
because I was not allocating enough connections. My machine has 2GB of RAM.

- How do I determine what DLL is failing and what is causing it to fail in
its initialization routine?

Thanks,

L


On Tue, 16 Oct 2007 16:02:32 -0700, Trevor Talbot wrote:

> On 10/16/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>> Laurent Duperval wrote:
>
>> > >> I get en error in the logs that state:
>> > >>
>> > >> Server process exited with exit code -1073741502
>
> FYI, this exit code means a DLL's initialization routine indicated
> failure during process startup.
>
>> > If I run each test case separately, I don't see this issue. But when I run
>> > them as a whole (i.e. run all tests defined in my application) I get the
>> > same error every time.
>>
>> Maybe Eclipse is trying to run more of them at a time than ant, and the
>> extra concurrency is killing the server for some reason.
>
> Sounds likely.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings



--
Prenez la parole en public en étant         Speak to an audience while being
moins nerveux et plus convaincant!         less nervous and more convincing!
Éveillez l'orateur en vous!                    Bring out the speaker in you!

Information: laurent@duperval.com   http://www.duperval.com   (514) 902-0186

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Magnus Hagander"

Date:

17 October 2007, 02:23:42

> Hi,
>
> Sorry for top-posting but since I am answering questions that don't all
> appear in this message:
>
> - I installed the default download of Postgres. I didn't compile myself,
> so it's probably the mingw version

It is.

> - Max_connections is set to 500. I did that originally because I kept
> seeing a message about no connection available and I thought it was
> because I was not allocating enough connections. My machine has 2GB of RAM.

There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200
dependingon some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb
ram,not above 100.  

You'll ned to figure out what's eating all your connections - it sounds like it's not entirely expected. Perhaps
conectionsare leaked somewhere? 

> - How do I determine what DLL is failing and what is causing it to fail in
> its initialization routine?

You really can't in this case, but if you could it wouldn't help you. It's windows running out of global resources.

/Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Tom Lane

Date:

17 October 2007, 03:40:26

"Magnus Hagander" <magnus@hagander.net> writes:
>> - Max_connections is set to 500.

> There's your problem. 500 is way above what the windows version can
> handle. IIRC the hard max is somewhere around 200 depending on some OS
> factors that we don't entirely know.

Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
to something we know the platform can stand?  It'd be more comfortable
if we understood exactly where the limit was, but I think I'd rather
have an "I'm sorry Dave, I can't do that" than random-seeming crashes.

            regards, tom lane

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

17 October 2007, 04:22:17

On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote:
> "Magnus Hagander" <magnus@hagander.net> writes:
> >> - Max_connections is set to 500.
>
> > There's your problem. 500 is way above what the windows version can
> > handle. IIRC the hard max is somewhere around 200 depending on some OS
> > factors that we don't entirely know.
>
> Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
> to something we know the platform can stand?  It'd be more comfortable
> if we understood exactly where the limit was, but I think I'd rather
> have an "I'm sorry Dave, I can't do that" than random-seeming crashes.

Yeayh, that's probably a good idea - except we never managed to figure out
where the limit is. It appears to vary pretty wildly between different
machines, for reasons we don't really know why (total RAM has some effect
on it, but that's not the only one, for example)

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Joshua D. Drake"

Date:

19 October 2007, 16:49:00

On Wed, 17 Oct 2007 09:22:11 +0200
Magnus Hagander <magnus@hagander.net> wrote:

> > Maybe we should put an #ifdef WIN32 into guc.c to limit
> > max_connections to something we know the platform can stand?  It'd
> > be more comfortable if we understood exactly where the limit was,
> > but I think I'd rather have an "I'm sorry Dave, I can't do that"
> > than random-seeming crashes.
>
> Yeayh, that's probably a good idea - except we never managed to
> figure out where the limit is. It appears to vary pretty wildly
> between different machines, for reasons we don't really know why
> (total RAM has some effect on it, but that's not the only one, for
> example)

How about we just emit a warning..

WARNING: Connections above 250 on Windows platforms may have
unpredictable results.

Joshua D. Drake




>
> //Magnus
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 1: if posting/reading
> through Usenet, please send an appropriate subscribe-nomail command
> to majordomo@postgresql.org so that your message can get through to
> the mailing list cleanly
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
            UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Attachment

signature.asc

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

20 October 2007, 08:35:49

"Magnus Hagander" wrote:

>> - Max_connections is set to 500. I did that originally because I kept
>> seeing a message about no connection available and I thought it was
>> because I was not allocating enough connections. My machine has 2GB of RAM.
>
>There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around 200
dependingon some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more than 2Gb
ram,not above 100.  

My guess is that Windows is running out of handles. Each backend uses about
150 handles. 100 Backends means 15000 handles. Depending how many other
programs are currently running the no. of startable backends will vary
depending on the total handle limit Windows imposes.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Magnus Hagander"

Date:

20 October 2007, 11:03:30

> > > Maybe we should put an #ifdef WIN32 into guc.c to limit
> > > max_connections to something we know the platform can stand?  It'd
> > > be more comfortable if we understood exactly where the limit was,
> > > but I think I'd rather have an "I'm sorry Dave, I can't do that"
> > > than random-seeming crashes.
> >
> > Yeayh, that's probably a good idea - except we never managed to
> > figure out where the limit is. It appears to vary pretty wildly
> > between different machines, for reasons we don't really know why
> > (total RAM has some effect on it, but that's not the only one, for
> > example)
>
> How about we just emit a warning..
>
> WARNING: Connections above 250 on Windows platforms may have
> unpredictable results.
>

That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq?

/Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Tom Lane

Date:

20 October 2007, 12:19:41

"Magnus Hagander" <magnus@hagander.net> writes:
>> How about we just emit a warning..
>>
>> WARNING: Connections above 250 on Windows platforms may have
>> unpredictable results.

> That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq?

Unless we've got some credible basis for citing a particular number,
I don't think this will help much.

Rainer Bauer <usenet@munnin.com> writes:
> My guess is that Windows is running out of handles. Each backend uses about
> 150 handles. 100 Backends means 15000 handles. Depending how many other
> programs are currently running the no. of startable backends will vary
> depending on the total handle limit Windows imposes.

I find this theory very interesting; for one thing it explains the
reported variability of results, since the non-Postgres demand for
handles could be anything.  Is there any way we could check it?
If it's accurate, what we ought to be whining about is some
combination of max_connections and max_files_per_process, rather
than only considering the former.

            regards, tom lane

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

20 October 2007, 12:33:24

Tom Lane wrote:
> "Magnus Hagander" <magnus@hagander.net> writes:
>>> How about we just emit a warning..
>>>
>>> WARNING: Connections above 250 on Windows platforms may have
>>> unpredictable results.
>
>> That's probably a better idea. I'll go look at that unless people feel we should just stick it in docd/faq?
>
> Unless we've got some credible basis for citing a particular number,
> I don't think this will help much.

ok. Maybe a note in the docs or FAQ at least?


> Rainer Bauer <usenet@munnin.com> writes:
>> My guess is that Windows is running out of handles. Each backend uses about
>> 150 handles. 100 Backends means 15000 handles. Depending how many other
>> programs are currently running the no. of startable backends will vary
>> depending on the total handle limit Windows imposes.
>
> I find this theory very interesting; for one thing it explains the
> reported variability of results, since the non-Postgres demand for
> handles could be anything.  Is there any way we could check it?
> If it's accurate, what we ought to be whining about is some
> combination of max_connections and max_files_per_process, rather
> than only considering the former.

It's not that simple. Merlin ran some checks, and drastically reducing
max_files_per_process made no measurable difference.

My best guess is it's due to the non-paged pool. Handles are a part of
what goes in there, but only a part.

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

20 October 2007, 13:40:28

On 10/17/07, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote:

> > Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
> > to something we know the platform can stand?  It'd be more comfortable
> > if we understood exactly where the limit was, but I think I'd rather
> > have an "I'm sorry Dave, I can't do that" than random-seeming crashes.
>
> Yeayh, that's probably a good idea - except we never managed to figure out
> where the limit is. It appears to vary pretty wildly between different
> machines, for reasons we don't really know why (total RAM has some effect
> on it, but that's not the only one, for example)

I tried generating idle connections in an effort to reproduce
Laurent's problem, but I ran into a local limit instead: for each
backend, postmaster creates a thread and burns 4MB of its 2GB address
space.  It fails around 490.

Laurent's issue must depend on other load characteristics.  It's
possible to get a trace of DLL loads, but I haven't found a
noninvasive way of doing that.  It seems to require a debugger be
attached.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

20 October 2007, 13:53:19

On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:
> "Magnus Hagander" wrote:
>
> >> - Max_connections is set to 500. I did that originally because I kept
> >> seeing a message about no connection available and I thought it was
> >> because I was not allocating enough connections. My machine has 2GB of RAM.
> >
> >There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around
200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more
than2Gb ram, not above 100. 
>
> My guess is that Windows is running out of handles. Each backend uses about
> 150 handles. 100 Backends means 15000 handles. Depending how many other
> programs are currently running the no. of startable backends will vary
> depending on the total handle limit Windows imposes.

Those are kernel object handles; the ceiling does depend on available
kernel memory, but they're cheap, and postgres is in no danger of
running into that limit.  Most of the handle limits people talk about
are on USER (window etc) objects, which come from a single shared
pool.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Shelby Cain

Date:

20 October 2007, 15:29:25

--- "Joshua D. Drake" <jd@commandprompt.com> wrote:
>
> How about we just emit a warning..
>
> WARNING: Connections above 250 on Windows platforms may have
> unpredictable results.
>
> Joshua D. Drake
>

I'd personally vote for a lower warning limit like 175 as I can
consistently crash Postgresql on Windows system right around the 200th
connection.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

20 October 2007, 15:45:03

On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote:

> I'd personally vote for a lower warning limit like 175 as I can
> consistently crash Postgresql on Windows system right around the 200th
> connection.

What error gets logged for your crashes?

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

20 October 2007, 17:35:55

"Trevor Talbot" wrote:

>On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:
>> "Magnus Hagander" wrote:
>>
>> >> - Max_connections is set to 500. I did that originally because I kept
>> >> seeing a message about no connection available and I thought it was
>> >> because I was not allocating enough connections. My machine has 2GB of RAM.
>> >
>> >There's your problem. 500 is way above what the windows version can handle. IIRC the hard max is somewhere around
200 depending on some OS factors that we don't entirely know. I'd never recommend going above 100-150. With no more
than2Gb ram, not above 100. 
>>
>> My guess is that Windows is running out of handles. Each backend uses about
>> 150 handles. 100 Backends means 15000 handles. Depending how many other
>> programs are currently running the no. of startable backends will vary
>> depending on the total handle limit Windows imposes.
>
>Those are kernel object handles; the ceiling does depend on available
>kernel memory, but they're cheap, and postgres is in no danger of
>running into that limit.  Most of the handle limits people talk about
>are on USER (window etc) objects, which come from a single shared
>pool.

You are right. I just did a quick test and depending on the handle type these
limits are quite high. I could create 5 millions events or 4 millions
semaphores or 3,5 millions mutexes before the system returned error 1816
ERROR_NOT_ENOUGH_QUOTA "Not enough quota is available to process this
command.".

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Shelby Cain

Date:

20 October 2007, 18:01:25

--- Trevor Talbot <quension@gmail.com> wrote:

> On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote:
>
> > I'd personally vote for a lower warning limit like 175 as I can
> > consistently crash Postgresql on Windows system right around the
> 200th
> > connection.
>
> What error gets logged for your crashes?
>

It's been a while but IIRC there wasn't anything in the logs other than
an entry noting that a backend had crashed unexpectedly so the
postmaster was restarting all active backends.  I can trivially
reproduce it at work on my workstation if you need the exact error
text.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

20 October 2007, 18:05:55

I wrote:

>You are right. I just did a quick test and depending on the handle type these
>limits are quite high. I could create 5 millions events or 4 millions
>semaphores or 3,5 millions mutexes before the system returned error 1816
>ERROR_NOT_ENOUGH_QUOTA "Not enough quota is available to process this
>command.".

[Does some further testing] The limit is high, but nonetheless Postgres is
running out of handles. Setting <max_connections> to 10000 and starting
postgres _without_ any connection consumes 40000 handles. This correspodends
to the 4 Postgres processes running after the server was started. Every new
connection consumes another 10000 handles.

I don't know the Postgres code involved, but it seems that every backend
consumes at least <max_connections> handles. Hence increasing this value will
have the opposite effect once a certain threshold is met.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

20 October 2007, 18:25:34

On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote:
>
> --- Trevor Talbot <quension@gmail.com> wrote:
>
> > On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote:
> >
> > > I'd personally vote for a lower warning limit like 175 as I can
> > > consistently crash Postgresql on Windows system right around the
> > 200th
> > > connection.
> >
> > What error gets logged for your crashes?
> >
>
> It's been a while but IIRC there wasn't anything in the logs other than
> an entry noting that a backend had crashed unexpectedly so the
> postmaster was restarting all active backends.  I can trivially
> reproduce it at work on my workstation if you need the exact error
> text.

I think it would be useful; if nothing else, maybe it'll tell us if
you can see the same problem Laruent does, or if it's a different
limit entirely.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

20 October 2007, 19:05:55

Shelby Cain wrote:

>--- Trevor Talbot <quension@gmail.com> wrote:
>
>> On 10/20/07, Shelby Cain <alyandon@yahoo.com> wrote:
>>
>> > I'd personally vote for a lower warning limit like 175 as I can
>> > consistently crash Postgresql on Windows system right around the
>> 200th
>> > connection.
>>
>> What error gets logged for your crashes?
>>
>
>It's been a while but IIRC there wasn't anything in the logs other than
>an entry noting that a backend had crashed unexpectedly so the
>postmaster was restarting all active backends.  I can trivially
>reproduce it at work on my workstation if you need the exact error
>text.

I could reproduce this here:

Server closed the connection unexpectedly
This probaly means the server terminated abnormally before or while processing
the request

2007-10-20 23:33:42 LOG:  server process (PID 5240) exited with exit code
-1073741502


Shelby, are you using the /3GB switch by chance? This will half the no. of
available handles on your system.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

20 October 2007, 19:15:32

On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:

> I could reproduce this here:
>
> Server closed the connection unexpectedly
> This probaly means the server terminated abnormally before or while processing
> the request
>
> 2007-10-20 23:33:42 LOG:  server process (PID 5240) exited with exit code
> -1073741502

How?

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Shelby Cain

Date:

20 October 2007, 19:26:38

--- Rainer Bauer <usenet@munnin.com> wrote:
> I could reproduce this here:
>
> Server closed the connection unexpectedly
> This probaly means the server terminated abnormally before or while
> processing
> the request
>
> 2007-10-20 23:33:42 LOG:  server process (PID 5240) exited with exit
> code
> -1073741502
>
>
> Shelby, are you using the /3GB switch by chance? This will half the
> no. of
> available handles on your system.
>
> Rainer
>

Probably not although I haven't examined boot.ini.  My workstation only
has 1.5 GB of ram so I'm highly doubtful that IBM would have configured
it to boot with the /3GB switch.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Fwd: Re[2]: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

20 October 2007, 19:50:30

I wrote

> Anyway, the problem are the no. of semaphores created by Postgres:
> Every backend creates at least 4*<max_connections> semaphores.

Sorry, this must read <max_connections> semaphores, not 4 times.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

20 October 2007, 19:56:26

Hello Trevor,

Sunday, October 21, 2007, 12:15:25 AM, you wrote:

TT> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:

>> I could reproduce this here:
>>
>> Server closed the connection unexpectedly
>> This probaly means the server terminated abnormally before or while processing
>> the request
>>
>> 2007-10-20 23:33:42 LOG:  server process (PID 5240) exited with exit code
>> -1073741502

TT> How?

Seems like the mailiming list is not catching up fast enough (I am
posting through usenet)...

Anyway, the problem are the no. of semaphores created by Postgres:
Every backend creates at least 4*<max_connections> semaphores. Just
increase <max_connections> to an unusual high value (say 10000) and
start creating new connections while monitoring the handle count.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

21 October 2007, 00:55:23

On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:

> Anyway, the problem are the no. of semaphores created by Postgres:
> Every backend creates at least 4*<max_connections> semaphores. Just
> increase <max_connections> to an unusual high value (say 10000) and
> start creating new connections while monitoring the handle count.

Hmm, they're actually the same semaphores, so the only cost is for
slots in each process's handle table, which comes from kernel paged
pool.  Testing shows I can easily create about 30 million handles to a
given object on this machine.  This is under win2003 with 1.25GB RAM,
which gives it a paged pool limit of 352MB.

I tried going up to 20000 max_connections, and still blew postmaster's
VM space long before paged pool was exhausted.  I couldn't test any
higher values, as there's some interaction between max_connections and
shared_buffers that prevents it from mapping the buffer contiguously.

Something's missing though, since I'm not hitting the same issue you
are.  How are you generating the connections?  I just have an app
calling PQconnectdb() in a loop, but I guess that's not good enough.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

21 October 2007, 05:47:27

Trevor Talbot wrote:
> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:
>
>> Anyway, the problem are the no. of semaphores created by Postgres:
>> Every backend creates at least 4*<max_connections> semaphores. Just
>> increase <max_connections> to an unusual high value (say 10000) and
>> start creating new connections while monitoring the handle count.
>
> Hmm, they're actually the same semaphores, so the only cost is for
> slots in each process's handle table, which comes from kernel paged
> pool.  Testing shows I can easily create about 30 million handles to a
> given object on this machine.  This is under win2003 with 1.25GB RAM,
> which gives it a paged pool limit of 352MB.
>
> I tried going up to 20000 max_connections, and still blew postmaster's
> VM space long before paged pool was exhausted.  I couldn't test any
> higher values, as there's some interaction between max_connections and
> shared_buffers that prevents it from mapping the buffer contiguously.
>
> Something's missing though, since I'm not hitting the same issue you
> are.  How are you generating the connections?  I just have an app
> calling PQconnectdb() in a loop, but I guess that's not good enough.

Yeah, something is obviously missing.. Are you guys on the exactly the
same Windows versions? WRT both version and servivepack. Anybody on x64
windows?

Another thing worth testing - check if the amount of shared memory used
makes a noticable difference. Try both very small and very large values.

I don't think the paged pool is the problem - I think it's the nonpaged
pool. Would be interesting to track that one in the failing case (using
performance monitor, up to the point where it fails). And the nonpaged
one is smaller... If that looks like it's the problem, it could be
helpful to do a pooltag trace on it (see for example

http://blogs.msdn.com/ntdebugging/archive/2006/12/18/Understanding-Pool-Consumption-and-Event-ID_3A00_--2020-or-2019.aspx)

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

21 October 2007, 06:01:44

Trevor Talbot wrote:
> On 10/17/07, Magnus Hagander <magnus@hagander.net> wrote:
>> On Wed, Oct 17, 2007 at 02:40:14AM -0400, Tom Lane wrote:
>
>>> Maybe we should put an #ifdef WIN32 into guc.c to limit max_connections
>>> to something we know the platform can stand?  It'd be more comfortable
>>> if we understood exactly where the limit was, but I think I'd rather
>>> have an "I'm sorry Dave, I can't do that" than random-seeming crashes.
>> Yeayh, that's probably a good idea - except we never managed to figure out
>> where the limit is. It appears to vary pretty wildly between different
>> machines, for reasons we don't really know why (total RAM has some effect
>> on it, but that's not the only one, for example)
>
> I tried generating idle connections in an effort to reproduce
> Laurent's problem, but I ran into a local limit instead: for each
> backend, postmaster creates a thread and burns 4MB of its 2GB address
> space.  It fails around 490.

Oh, that's interesting. That's actually a sideeffect of us increasing
the stack size for the postgres.exe executable in order to work on other
things. By default, it burns 1MB/thread, but ours will do 4MB. Never
really thought of the problem that it'll run out of address space.
Unfortunately, that size can't be changed in the CreateThread() call -
only the initially committed size can be changed there.

There are two ways to get around it - one is not using a thread for each
backend, but a single thread that handles them all and then some sync
objects around it. We originally considered this but said we won't
bother changing it because the current way is simpler, and the overhead
of a thread is tiny compared to a process. I don't think anybody even
thought about the fact that it'd run you out of address space...

The other way is to finish off win64 support :-) Which I plan to look
at, but I don't think that alone should be considered a solution.

The question is if it's worth fixing that part, if it will just fall
down for other reasons before we reach these 500 connections anyway. Can
you try having your program actually run some queries and so, and not
just do a PQconnect? To see if it falls over then, because it's been
doing more?

> Laurent's issue must depend on other load characteristics.  It's
> possible to get a trace of DLL loads, but I haven't found a
> noninvasive way of doing that.  It seems to require a debugger be
> attached.

AFAIK, it does require that, yes.

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

21 October 2007, 17:06:16

Magnus Hagander wrote:

>Trevor Talbot wrote:
>> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:
>>
>>> Anyway, the problem are the no. of semaphores created by Postgres:
>>> Every backend creates at least 4*<max_connections> semaphores. Just
>>> increase <max_connections> to an unusual high value (say 10000) and
>>> start creating new connections while monitoring the handle count.
>>
>> Hmm, they're actually the same semaphores, so the only cost is for
>> slots in each process's handle table, which comes from kernel paged
>> pool.  Testing shows I can easily create about 30 million handles to a
>> given object on this machine.  This is under win2003 with 1.25GB RAM,
>> which gives it a paged pool limit of 352MB.

On my system I can only create about 4 millions semaphores.

>> I tried going up to 20000 max_connections, and still blew postmaster's
>> VM space long before paged pool was exhausted.  I couldn't test any
>> higher values, as there's some interaction between max_connections and
>> shared_buffers that prevents it from mapping the buffer contiguously.
>>
>> Something's missing though, since I'm not hitting the same issue you
>> are.  How are you generating the connections?  I just have an app
>> calling PQconnectdb() in a loop, but I guess that's not good enough.

I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
establish the test connections.

>Yeah, something is obviously missing.. Are you guys on the exactly the
>same Windows versions? WRT both version and servivepack. Anybody on x64
>windows?

No, I am using WinXP SP2 32 bit with 2GB RAM.

These are my altered settings from the default 8.2.5 Postgres installation:
ssl = on
shared_buffers = 512MB
work_mem = 16MB
maintenance_work_mem = 256MB
wal_sync_method = fsync_writethrough
checkpoint_segments = 15
checkpoint_timeout = 30min
random_page_cost = 3.0
effective_cache_size = 1GB
autovacuum_vacuum_scale_factor = 0.10
autovacuum_analyze_scale_factor = 0.05

>Another thing worth testing - check if the amount of shared memory used
>makes a noticable difference. Try both very small and very large values.

Well I tried different shared_buffers settings, but the result was consisting:
with max_connections set to 10000, I can create 150 database connections.

However, I checked the handle count at the moment the last connection fails
and it is only at 1,5 million. So it seems the handles are not the primary
problem.

Let me know if you want any other tests performed on this machine. I also have
VS2005 installed, but until now I haven't compiled Postgres here (I was
waiting for 8.3 which fully supports building with VS).

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 05:23:24

On Sun, Oct 21, 2007 at 09:43:27PM +0200, Rainer Bauer wrote:
> Magnus Hagander wrote:
>
> >Trevor Talbot wrote:
> >> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:
> >>
> >>> Anyway, the problem are the no. of semaphores created by Postgres:
> >>> Every backend creates at least 4*<max_connections> semaphores. Just
> >>> increase <max_connections> to an unusual high value (say 10000) and
> >>> start creating new connections while monitoring the handle count.
> >>
> >> Hmm, they're actually the same semaphores, so the only cost is for
> >> slots in each process's handle table, which comes from kernel paged
> >> pool.  Testing shows I can easily create about 30 million handles to a
> >> given object on this machine.  This is under win2003 with 1.25GB RAM,
> >> which gives it a paged pool limit of 352MB.
>
> On my system I can only create about 4 millions semaphores.

Is that 4 million semaphores, or 4 million handles to a smaller number of
semaphores?

> >> I tried going up to 20000 max_connections, and still blew postmaster's
> >> VM space long before paged pool was exhausted.  I couldn't test any
> >> higher values, as there's some interaction between max_connections and
> >> shared_buffers that prevents it from mapping the buffer contiguously.
> >>
> >> Something's missing though, since I'm not hitting the same issue you
> >> are.  How are you generating the connections?  I just have an app
> >> calling PQconnectdb() in a loop, but I guess that's not good enough.
>
> I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
> establish the test connections.

Could you try the same tests with the client runnint on a different system?
Since the client eats up a bunch of handles and such as well, and that
would eliminate the difference due to different clients.


> >Yeah, something is obviously missing.. Are you guys on the exactly the
> >same Windows versions? WRT both version and servivepack. Anybody on x64
> >windows?
>
> No, I am using WinXP SP2 32 bit with 2GB RAM.

Ok. So one is on XP and one is on 2003. That' interesting - given that 2003
is tuned towards servers, it doesn't surprise me that it allows more
clients before breaking.


> These are my altered settings from the default 8.2.5 Postgres installation:
> ssl = on

Does it make a difference if you turn this off?

> shared_buffers = 512MB

As a general note, thsi is *way* too high. All evidence I've seen points to
that you should have shared_buffers as *small* as possible on win32,
because memory access there is slow. And leave more of the caching up to
the OS.

> work_mem = 16MB
> maintenance_work_mem = 256MB
> wal_sync_method = fsync_writethrough
> checkpoint_segments = 15
> checkpoint_timeout = 30min
> random_page_cost = 3.0
> effective_cache_size = 1GB
> autovacuum_vacuum_scale_factor = 0.10
> autovacuum_analyze_scale_factor = 0.05

None of those should make a difference on this.

>
> >Another thing worth testing - check if the amount of shared memory used
> >makes a noticable difference. Try both very small and very large values.
>
> Well I tried different shared_buffers settings, but the result was consisting:
> with max_connections set to 10000, I can create 150 database connections.

Ok. But if you decrease max_connections, you can have more connections? Or
the other way around?


> However, I checked the handle count at the moment the last connection fails
> and it is only at 1,5 million. So it seems the handles are not the primary
> problem.

Good, it shouldn't be, but it's good to have that confirmed.

/Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 05:41:44

On Mon, Oct 22, 2007 at 10:23:16AM +0200, Magnus Hagander wrote:
> > >> I tried going up to 20000 max_connections, and still blew postmaster's
> > >> VM space long before paged pool was exhausted.  I couldn't test any
> > >> higher values, as there's some interaction between max_connections and
> > >> shared_buffers that prevents it from mapping the buffer contiguously.
> > >>
> > >> Something's missing though, since I'm not hitting the same issue you
> > >> are.  How are you generating the connections?  I just have an app
> > >> calling PQconnectdb() in a loop, but I guess that's not good enough.
> >
> > I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
> > establish the test connections.
>
> Could you try the same tests with the client runnint on a different system?
> Since the client eats up a bunch of handles and such as well, and that
> would eliminate the difference due to different clients.

Followup, when running these tests, could you check using Process Explorer
if you're hitting close to the limit of either of the two pools? See
http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 09:42:10

On Mon, Oct 22, 2007 at 10:41:14AM +0200, Magnus Hagander wrote:
> On Mon, Oct 22, 2007 at 10:23:16AM +0200, Magnus Hagander wrote:
> > > >> I tried going up to 20000 max_connections, and still blew postmaster's
> > > >> VM space long before paged pool was exhausted.  I couldn't test any
> > > >> higher values, as there's some interaction between max_connections and
> > > >> shared_buffers that prevents it from mapping the buffer contiguously.
> > > >>
> > > >> Something's missing though, since I'm not hitting the same issue you
> > > >> are.  How are you generating the connections?  I just have an app
> > > >> calling PQconnectdb() in a loop, but I guess that's not good enough.
> > >
> > > I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
> > > establish the test connections.
> >
> > Could you try the same tests with the client runnint on a different system?
> > Since the client eats up a bunch of handles and such as well, and that
> > would eliminate the difference due to different clients.
>
> Followup, when running these tests, could you check using Process Explorer
> if you're hitting close to the limit of either of the two pools? See
> http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx

Another followup. Been working with Dave on and off today (well, him mostly
on to be honest, me a bit more on and off), and it seems that both our
repros clearly blame the desktop heap, and nothing else. Please use the
desktop heap tool and see if it breaks when the desktop heap usage
approaches 100%:

http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en


It'd still be good to know why the difference is so big between your two
systems.


//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 10:33:45

Magnus Hagander wrote:
> Another followup. Been working with Dave on and off today (well, him mostly
> on to be honest, me a bit more on and off), and it seems that both our
> repros clearly blame the desktop heap, and nothing else. Please use the
> desktop heap tool and see if it breaks when the desktop heap usage
> approaches 100%:
>
> http://www.microsoft.com/downloads/details.aspx?familyid=5cfc9b74-97aa-4510-b4b9-b2dc98c8ed8b&displaylang=en
>
> It'd still be good to know why the difference is so big between your two
> systems.

Further info on this for the record - on XP Pro (which I'm testing on),
the desktop heap size defaults to 512KB for non-interactive sessions and
3072KB for interactive. In testing I find that I can get up to around 46
or so connections when running as a service before desktop heap is
exhausted and postgres dies.

When running interactively I can get a little over 125 connections
before things start dying, however, in that case it's *not* because of
dekstop heap, because I can start a second cluster and run 125
connections on each simultaneously.

If I run three instances up together, one of them will die as soon as
desktop heap gets to 100% usage.

So, we seem to be hitting two limits here - the desktop heap, and
something else which is cluster-specific. Investigation continues...

Regards, Dave

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

22 October 2007, 11:06:28

Magnus Hagander schrieb:

>On Sun, Oct 21, 2007 at 09:43:27PM +0200, Rainer Bauer wrote:
>> Magnus Hagander wrote:
>>
>> >Trevor Talbot wrote:
>> >> On 10/20/07, Rainer Bauer <usenet@munnin.com> wrote:
>> >>
>> >>> Anyway, the problem are the no. of semaphores created by Postgres:
>> >>> Every backend creates at least 4*<max_connections> semaphores. Just
>> >>> increase <max_connections> to an unusual high value (say 10000) and
>> >>> start creating new connections while monitoring the handle count.
>> >>
>> >> Hmm, they're actually the same semaphores, so the only cost is for
>> >> slots in each process's handle table, which comes from kernel paged
>> >> pool.  Testing shows I can easily create about 30 million handles to a
>> >> given object on this machine.  This is under win2003 with 1.25GB RAM,
>> >> which gives it a paged pool limit of 352MB.
>>
>> On my system I can only create about 4 millions semaphores.
>
>Is that 4 million semaphores, or 4 million handles to a smaller number of
>semaphores?

No, 4 millions distinct semaphores by calling:
CreateSemaphore( NULL, 0, 1, NULL );

>> >> I tried going up to 20000 max_connections, and still blew postmaster's
>> >> VM space long before paged pool was exhausted.  I couldn't test any
>> >> higher values, as there's some interaction between max_connections and
>> >> shared_buffers that prevents it from mapping the buffer contiguously.
>> >>
>> >> Something's missing though, since I'm not hitting the same issue you
>> >> are.  How are you generating the connections?  I just have an app
>> >> calling PQconnectdb() in a loop, but I guess that's not good enough.
>>
>> I am using the ASCII version of the psqlODBC driver version 8.2.4.2 to
>> establish the test connections.
>
>Could you try the same tests with the client runnint on a different system?
>Since the client eats up a bunch of handles and such as well, and that
>would eliminate the difference due to different clients.
>
>Followup, when running these tests, could you check using Process Explorer
>if you're hitting close to the limit of either of the two pools? See
>http://blogs.technet.com/askperf/archive/2007/03/07/memory-management-understanding-pool-resources.aspx

Well after installing Postgres explorer and starting the system information
program the kernel memory section shows me the current count, but not the
limits (it says "no symbols"). I am currently downloading the "Debugging Tools
for Windows". Maybe these limits are shown after the installation.

I just repeated the test with a local connection. After 150 connections, the
following values are displayed:
Paged physical    113000
Paged virtual   120000
Nonpaged     28000

Also there are 1.583.182 handles open.

I will check the behaviour with a remote connection later (have to go now...).

>> These are my altered settings from the default 8.2.5 Postgres installation:
>> ssl = on
>
>Does it make a difference if you turn this off?
>
>> shared_buffers = 512MB
>
>As a general note, thsi is *way* too high. All evidence I've seen points to
>that you should have shared_buffers as *small* as possible on win32,
>because memory access there is slow. And leave more of the caching up to
>the OS.

I followed Josh's advice here:
<http://archives.postgresql.org/pgsql-performance/2007-06/msg00606.php>

What value would you recommend then? The default 32MB?

>> These are my altered settings from the default 8.2.5 Postgres installation:
>> ssl = on
>
>Does it make a difference if you turn this off?

No.

>> >Another thing worth testing - check if the amount of shared memory used
>> >makes a noticable difference. Try both very small and very large values.
>>
>> Well I tried different shared_buffers settings, but the result was consisting:
>> with max_connections set to 10000, I can create 150 database connections.
>
>Ok. But if you decrease max_connections, you can have more connections? Or
>the other way around?

A few tests indicated, that the maximum no. of connections is 150, regardless
of the <max_connections> settings. But I will have to check whether this is
somehow caused by the ODBC driver.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

22 October 2007, 11:06:30

Dave Page wrote:

>So, we seem to be hitting two limits here - the desktop heap, and
>something else which is cluster-specific. Investigation continues...

I will make these tests tonight or tomorrow morning and will let you know.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 11:25:19

On 10/22/07, Rainer Bauer <usenet@munnin.com> wrote:

> Well after installing Postgres explorer and starting the system information
> program the kernel memory section shows me the current count, but not the
> limits (it says "no symbols"). I am currently downloading the "Debugging Tools
> for Windows". Maybe these limits are shown after the installation.

After you install that, go to Options->Configure Symbols in Process
Explorer.  Change it to use the dbghelp.dll installed in the Debugging
Tools directory, and configure the symbol path like this:
http://support.microsoft.com/kb/311503

The limits should show up after that.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 11:33:31

On Mon, Oct 22, 2007 at 04:03:35PM +0200, Rainer Bauer wrote:
> >> shared_buffers = 512MB
> >
> >As a general note, thsi is *way* too high. All evidence I've seen points to
> >that you should have shared_buffers as *small* as possible on win32,
> >because memory access there is slow. And leave more of the caching up to
> >the OS.
>
> I followed Josh's advice here:
> <http://archives.postgresql.org/pgsql-performance/2007-06/msg00606.php>
>
> What value would you recommend then? The default 32MB?

That advice is good - for Unix platforms. For windows, yes, try with 32Mb.


//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 11:38:32

Dave Page wrote:
> So, we seem to be hitting two limits here - the desktop heap, and
> something else which is cluster-specific. Investigation continues...

In further info, I've been testing this with the 8.3b1 release build
that we put out with pgInstaller, and a build with all optional
dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing
pretty much the same results with each - roughtly 9.6KB of desktop heap
used per connection.

In addition, I've tried with standard pgbench runs, as well as a script
that just does 'select version()'. Again, no differences were observed.

Magnus and I did observe that we're using 1 user object and 4 GDI
objects per connection. If anyone happens to know how we might identify
those, please shout as so far we've drawn a blank :-(

Regards, Dave

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 11:43:49

On 10/21/07, Magnus Hagander <magnus@hagander.net> wrote:

> > I tried generating idle connections in an effort to reproduce
> > Laurent's problem, but I ran into a local limit instead: for each
> > backend, postmaster creates a thread and burns 4MB of its 2GB address
> > space.  It fails around 490.
>
> Oh, that's interesting. That's actually a sideeffect of us increasing
> the stack size for the postgres.exe executable in order to work on other
> things. By default, it burns 1MB/thread, but ours will do 4MB. Never
> really thought of the problem that it'll run out of address space.
> Unfortunately, that size can't be changed in the CreateThread() call -
> only the initially committed size can be changed there.
>
> There are two ways to get around it - one is not using a thread for each
> backend, but a single thread that handles them all and then some sync
> objects around it. We originally considered this but said we won't
> bother changing it because the current way is simpler, and the overhead
> of a thread is tiny compared to a process. I don't think anybody even
> thought about the fact that it'd run you out of address space...

I'd probably take the approach of combining win32_waitpid() and
threads.  You'd end up with 1 thread per 64 backends; when something
interesting happens the thread could push the info onto a queue, which
the new win32_waitpid() would check.  Use APCs to add new backends to
threads with free slots.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 12:04:17

On 10/22/07, Dave Page <dpage@postgresql.org> wrote:
> Dave Page wrote:
> > So, we seem to be hitting two limits here - the desktop heap, and
> > something else which is cluster-specific. Investigation continues...
>
> In further info, I've been testing this with the 8.3b1 release build
> that we put out with pgInstaller, and a build with all optional
> dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing
> pretty much the same results with each - roughtly 9.6KB of desktop heap
> used per connection.

The question is where that's coming from.  I wondered if it was
desktop heap originally, but there's no reason it should be using it,
and that seems to be precisely the difference between my system and
the others.  Connections here are barely making a dent; at 490 there's
an entire 45KB committed in the service desktop.

> Magnus and I did observe that we're using 1 user object and 4 GDI
> objects per connection. If anyone happens to know how we might identify
> those, please shout as so far we've drawn a blank :-(

Those appear to belong to the console window.

I've yet to do anything that generates real load (lightweight system),
but a simple "select version()" doesn't make any difference here
either, and raising shared buffers just makes postmaster run out of VM
space faster.  (I don't think I mentioned that error before, but it
shows up as "FATAL:  could not create sigchld waiter thread: error
code 8".)

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 12:11:41

On Mon, Oct 22, 2007 at 08:04:03AM -0700, Trevor Talbot wrote:
> On 10/22/07, Dave Page <dpage@postgresql.org> wrote:
> > Dave Page wrote:
> > > So, we seem to be hitting two limits here - the desktop heap, and
> > > something else which is cluster-specific. Investigation continues...
> >
> > In further info, I've been testing this with the 8.3b1 release build
> > that we put out with pgInstaller, and a build with all optional
> > dependencies (OpenSSL, Kerberos, gettext, ldap etc) disabled. I'm seeing
> > pretty much the same results with each - roughtly 9.6KB of desktop heap
> > used per connection.
>
> The question is where that's coming from.  I wondered if it was
> desktop heap originally, but there's no reason it should be using it,
> and that seems to be precisely the difference between my system and
> the others.  Connections here are barely making a dent; at 490 there's
> an entire 45KB committed in the service desktop.

Yes, that would be very interesting to know. Because obviouslyi it's
something.

I read somewhere that Vista makes the size of the desktop heap dynamic, but
you were on 2003, right?

Are you running the server as a service or from the commandprompt?


> > Magnus and I did observe that we're using 1 user object and 4 GDI
> > objects per connection. If anyone happens to know how we might identify
> > those, please shout as so far we've drawn a blank :-(
>
> Those appear to belong to the console window.

Makes sense - a Windows, a system menu, etc. There's probably a "hidden
console window" when running as a service...


> I've yet to do anything that generates real load (lightweight system),
> but a simple "select version()" doesn't make any difference here
> either, and raising shared buffers just makes postmaster run out of VM
> space faster.  (I don't think I mentioned that error before, but it
> shows up as "FATAL:  could not create sigchld waiter thread: error
> code 8".)

Yeah, that makes sense. We need to fix that, but I think that's too big of
a change to push during beta, given how few reports we've had of people
running into it.

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 12:31:09

On 10/22/07, Magnus Hagander <magnus@hagander.net> wrote:

> I read somewhere that Vista makes the size of the desktop heap dynamic, but
> you were on 2003, right?

Yeah, 32bit 2003 SP2, which has the same limits as XP.  It looks like
Vista also has the same limits on actual heap sizes, but manages
kernel address space dynamically, so it doesn't get stuck with
arbitrary limits there.  I don't have a Vista machine to verify
though.

> Are you running the server as a service or from the commandprompt?

Service, I've been using the standard MSI install of 8.2.5.

> > > Magnus and I did observe that we're using 1 user object and 4 GDI
> > > objects per connection. If anyone happens to know how we might identify
> > > those, please shout as so far we've drawn a blank :-(
> >
> > Those appear to belong to the console window.
>
> Makes sense - a Windows, a system menu, etc. There's probably a "hidden
> console window" when running as a service...

Well, the only thing actually running as a service is pg_ctl; the
other processes just belong to the same desktop.  They're all console
executables, so they get the usual objects, but they're not visible
anywhere.

It could be that there's a significant difference between XP and 2003
in how that's handled though.  I do have an XP SP2 machine here with
512MB RAM, and I'll try tests on it as soon as I can free up what it's
currently occupied with.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 12:43:20

Trevor Talbot wrote:
> The question is where that's coming from.  I wondered if it was
> desktop heap originally, but there's no reason it should be using it,
> and that seems to be precisely the difference between my system and
> the others.  Connections here are barely making a dent; at 490 there's
> an entire 45KB committed in the service desktop.

Hmm, Greg mentioned to me earlier that he was suspicious of SSPI which
seems to drag in dependencies on gdi32.dll and user32.dll via
secur32.dll. Sure enough, testing with 8.2.5 on XP Pro, I get to 150
connections running as a service having used 97.2 of desktop heap (vs.
45 connections max with 8.3).

So we have a pretty serious regression in 8.3.

Of course, that still doesn't tally up with what you're seeing on
Win2k3. I'll test on there tomorrow.

Regards, Dave

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 12:59:31

I wrote:

[ desktop heap usage ]

> It could be that there's a significant difference between XP and 2003
> in how that's handled though.  I do have an XP SP2 machine here with
> 512MB RAM, and I'll try tests on it as soon as I can free up what it's
> currently occupied with.

...yep, under XP I'm using about 3.1KB of the service heap per
connection, which tears through it quite a bit faster.  Now to figure
out exactly where it's coming from...

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 13:03:04

Trevor Talbot wrote:
> I wrote:
>
> [ desktop heap usage ]
>
>> It could be that there's a significant difference between XP and 2003
>> in how that's handled though.  I do have an XP SP2 machine here with
>> 512MB RAM, and I'll try tests on it as soon as I can free up what it's
>> currently occupied with.
>
> ...yep, under XP I'm using about 3.1KB of the service heap per
> connection, which tears through it quite a bit faster.  Now to figure
> out exactly where it's coming from...

That ties up with what I'm seeing - on 8.3 it's about 9.6KB per
connection, and I get a little under a third as many connections as 8.2
before it dies.

/D

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 13:35:59

Dave Page wrote:
> Trevor Talbot wrote:
>> The question is where that's coming from.  I wondered if it was
>> desktop heap originally, but there's no reason it should be using it,
>> and that seems to be precisely the difference between my system and
>> the others.  Connections here are barely making a dent; at 490 there's
>> an entire 45KB committed in the service desktop.
>
> Hmm, Greg mentioned to me earlier that he was suspicious of SSPI which
> seems to drag in dependencies on gdi32.dll and user32.dll via
> secur32.dll. Sure enough, testing with 8.2.5 on XP Pro, I get to 150
> connections running as a service having used 97.2 of desktop heap (vs.
> 45 connections max with 8.3).
>
> So we have a pretty serious regression in 8.3.
>
> Of course, that still doesn't tally up with what you're seeing on
> Win2k3. I'll test on there tomorrow.

Could you try a build without SSPI? It should be as simple as removing
the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to
touch the linker lines at all, actually, so try without first.

//Magnus

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 15:53:05

Florian Weimer wrote:
> * Magnus Hagander:
>
>> Oh, that's interesting. That's actually a sideeffect of us increasing
>> the stack size for the postgres.exe executable in order to work on other
>> things. By default, it burns 1MB/thread, but ours will do 4MB. Never
>> really thought of the problem that it'll run out of address space.
>> Unfortunately, that size can't be changed in the CreateThread() call -
>> only the initially committed size can be changed there.
>
> Windows XP supports the STACK_SIZE_PARAM_IS_A_RESERVATION flag, which
> apparently allows to reduce the reserved size.  It might be better to do
> this the other way round, though (leave the reservation at its 1 MB
> default, and increase it only when necessary).

It does, but we still support windows 2000 as well. I think it's better
to use a different method altogether - one not using one thread per child.

//Magnus

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Florian Weimer

Date:

22 October 2007, 16:07:42

* Magnus Hagander:

> Oh, that's interesting. That's actually a sideeffect of us increasing
> the stack size for the postgres.exe executable in order to work on other
> things. By default, it burns 1MB/thread, but ours will do 4MB. Never
> really thought of the problem that it'll run out of address space.
> Unfortunately, that size can't be changed in the CreateThread() call -
> only the initially committed size can be changed there.

Windows XP supports the STACK_SIZE_PARAM_IS_A_RESERVATION flag, which
apparently allows to reduce the reserved size.  It might be better to do
this the other way round, though (leave the reservation at its 1 MB
default, and increase it only when necessary).

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 16:19:49

Trevor Talbot wrote:
> On 10/21/07, Magnus Hagander <magnus@hagander.net> wrote:
>
>>> I tried generating idle connections in an effort to reproduce
>>> Laurent's problem, but I ran into a local limit instead: for each
>>> backend, postmaster creates a thread and burns 4MB of its 2GB address
>>> space.  It fails around 490.
>> Oh, that's interesting. That's actually a sideeffect of us increasing
>> the stack size for the postgres.exe executable in order to work on other
>> things. By default, it burns 1MB/thread, but ours will do 4MB. Never
>> really thought of the problem that it'll run out of address space.
>> Unfortunately, that size can't be changed in the CreateThread() call -
>> only the initially committed size can be changed there.
>>
>> There are two ways to get around it - one is not using a thread for each
>> backend, but a single thread that handles them all and then some sync
>> objects around it. We originally considered this but said we won't
>> bother changing it because the current way is simpler, and the overhead
>> of a thread is tiny compared to a process. I don't think anybody even
>> thought about the fact that it'd run you out of address space...
>
> I'd probably take the approach of combining win32_waitpid() and
> threads.  You'd end up with 1 thread per 64 backends; when something
> interesting happens the thread could push the info onto a queue, which
> the new win32_waitpid() would check.  Use APCs to add new backends to
> threads with free slots.

I was planning to make it even easier and let Windows do the job for us,
just using RegisterWaitForSingleObject(). Does the same - one thread per
64 backends, but we don't have to deal with the queueing ourselves.
Should be rather trivial to do.

Keeps win32_waitpid() unchanged.

That said, refactoring win32_waitpid() to be based on a queue might be a
good idea *anyway*. Have the callback from above put something in the
queue, and go with your idea for the rest.

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 17:01:04

Magnus Hagander wrote:
> Could you try a build without SSPI? It should be as simple as removing
> the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to
> touch the linker lines at all, actually, so try without first.

Nope, doesn't help - still using around 9.7KB per connection. Just to be
sure I did remove the link option, and checking with depends see that
there are now only delay load references to secur32.dll, nothing direct
- as is the case with 8.2

So the only other changes I can think of that might affect things are
the VC++ build or the shared memory changes, though I can't see why they
would cause problems offhand. I'll go try a mingw build...

/D

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 17:17:03

Dave Page wrote:
> Magnus Hagander wrote:
>> Could you try a build without SSPI? It should be as simple as removing
>> the #define ENABLE_SSPI 1 from port/win32.h. I don't think you need to
>> touch the linker lines at all, actually, so try without first.
>
> Nope, doesn't help - still using around 9.7KB per connection. Just to be
> sure I did remove the link option, and checking with depends see that
> there are now only delay load references to secur32.dll, nothing direct
> - as is the case with 8.2

ok. That makes sense, actually...


> So the only other changes I can think of that might affect things are
> the VC++ build or the shared memory changes, though I can't see why they
> would cause problems offhand. I'll go try a mingw build...

Yeah, it could be that the newer MSVCRT files do something we don't
like.. Other than that, did we upgrade to a different version of some of
our dependents?

Also, is this the DEBUG or RELEASE build of 8.3?

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 17:19:35

On 10/22/07, Magnus Hagander <magnus@hagander.net> wrote:
> Trevor Talbot wrote:

> > I'd probably take the approach of combining win32_waitpid() and
> > threads.  You'd end up with 1 thread per 64 backends; when something
> > interesting happens the thread could push the info onto a queue, which
> > the new win32_waitpid() would check.  Use APCs to add new backends to
> > threads with free slots.
>
> I was planning to make it even easier and let Windows do the job for us,
> just using RegisterWaitForSingleObject(). Does the same - one thread per
> 64 backends, but we don't have to deal with the queueing ourselves.

Oh, good call -- I keep forgetting the native thread pool exists.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 17:19:43

Dave Page wrote:
> So the only other changes I can think of that might affect things are
> the VC++ build or the shared memory changes, though I can't see why they
> would cause problems offhand. I'll go try a mingw build...

mingw build of stock 8.3b1, no configure options specified at all,
consumes 3.2KB of desktop heap per connection.

So, it's either something we're doing different with the VC++
compile/link options, or it's the VC8 runtimes using more resources.

Oh, and I still see the second limitation where it bombs out over about
125 connections, so that isn't build/runtime specific.

Shall we take this over to -hackers btw?

/D

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

22 October 2007, 17:23:50

Magnus Hagander wrote:
> Yeah, it could be that the newer MSVCRT files do something we don't
> like.. Other than that, did we upgrade to a different version of some of
> our dependents?

Most of them - but my test build is without any of them:

our $config = {
    asserts=>1,            # --enable-cassert
    integer_datetimes=>1,   # --enable-integer-datetimes
    nls=>undef,                # --enable-nls=<path>
    tcl=>undef,              # --with-tls=<path>
    perl=>undef,                    # --with-perl
    python=>undef, # --with-python=<path>
    krb5=>undef, # --with-krb5=<path>
    ldap=>0,                    # --with-ldap
    openssl=>undef, # --with-ssl=<path>
    xml=>undef,
    xslt=>undef,
    iconv=>undef,
    zlib=>undef # --with-zlib=<path>
};

> Also, is this the DEBUG or RELEASE build of 8.3?

Both behave similarly.

/D

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Tom Lane

Date:

22 October 2007, 17:26:23

Magnus Hagander <magnus@hagander.net> writes:
> I was planning to make it even easier and let Windows do the job for us,
> just using RegisterWaitForSingleObject(). Does the same - one thread per
> 64 backends, but we don't have to deal with the queueing ourselves.
> Should be rather trivial to do.

How can that possibly work?  Backends have to be able to run
concurrently, and I don't see how they'll do that if they share a stack.

            regards, tom lane

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

22 October 2007, 17:44:37

On 10/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > I was planning to make it even easier and let Windows do the job for us,
> > just using RegisterWaitForSingleObject(). Does the same - one thread per
> > 64 backends, but we don't have to deal with the queueing ourselves.
> > Should be rather trivial to do.
>
> How can that possibly work?  Backends have to be able to run
> concurrently, and I don't see how they'll do that if they share a stack.

This is about what postmaster does for its SIGCHLD wait equivalent on
win32.  The 64 comes from Windows' object/event mechanism, which lets
you perform a blocking wait on up to that many handles in a single
call.  Currently postmaster is creating a new thread to wait on only
one backend at a time, so it ends up with too many threads.

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

22 October 2007, 17:45:45

Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> I was planning to make it even easier and let Windows do the job for us,
>> just using RegisterWaitForSingleObject(). Does the same - one thread per
>> 64 backends, but we don't have to deal with the queueing ourselves.
>> Should be rather trivial to do.
>
> How can that possibly work?  Backends have to be able to run
> concurrently, and I don't see how they'll do that if they share a stack.

We're not talking about the backends, we're talking about the backend
waiter threads whose sole purpose is to wait for a backend to die and
then raise a signal when it does. We can easily have the kernel wait for
a whole bunch of them at once, and have it call our callback function
whenever anyone of them dies.

//Magnus

Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Tom Lane

Date:

22 October 2007, 18:40:08

Magnus Hagander <magnus@hagander.net> writes:
> We're not talking about the backends, we're talking about the backend
> waiter threads whose sole purpose is to wait for a backend to die and
> then raise a signal when it does.

Oh, OK, I had not twigged to exactly what the threads were being used
for.  Never mind ...

            regards, tom lane

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

23 October 2007, 09:06:59

"Trevor Talbot" wrote:

>I wrote:
>
>[ desktop heap usage ]
>
>> It could be that there's a significant difference between XP and 2003
>> in how that's handled though.  I do have an XP SP2 machine here with
>> 512MB RAM, and I'll try tests on it as soon as I can free up what it's
>> currently occupied with.
>
>...yep, under XP I'm using about 3.1KB of the service heap per
>connection, which tears through it quite a bit faster.  Now to figure
>out exactly where it's coming from...

I can confirm this here (WinXP SP2).

I have restored the original postgresql.conf file that was created when the
cluster was initialized with Postgres 8.2.4-1 (the installed version now is
8.2.5-1). The only other change to this installation is that I have moved the
WAL directory pg_xlog to another drive using a junction link.

Here are my numbers from SysInternals System Information program:
Pages Limit:    364544KB  [356MB]
Nonpaged Limit: 262144KB  [256MB]
These limits are never reached.

Using the Desktop Heap Monitor every new connection consumes 3232 bytes of the
total 512KB heap.

>It could be that there's a significant difference between XP and 2003
>in how that's handled though.  I do have an XP SP2 machine here with
>512MB RAM, and I'll try tests on it as soon as I can free up what it's
>currently occupied with.

Yeah, Win2003 behaves differently accoriding to this source:
<http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx>

<quote>
Session paged pool allows session specific paged pool allocations.  Windows XP
uses regular paged pool, since the number of remote desktop connections is
limited.  On the other hand, Windows Server 2003 makes allocations from
session paged pool instead of regular paged pool if Terminal Services
(application server mode) is installed.
</quote>

After increasing the session heap size in the registry from 512KB to 1024KB
the no. of connections was roughly doubled. So this might be a solution for
people running out of Desktop heap.

Alter the value of the following key
<HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>

The numeric values following "SharedSection=" control the heap management:
On WinXP these are the default values: "SharedSection=1024,3072,512"
Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
non-interactive window stations to 1024KB.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

23 October 2007, 12:14:30

Rainer Bauer wrote:
>> ...yep, under XP I'm using about 3.1KB of the service heap per
>> connection, which tears through it quite a bit faster.  Now to figure
>> out exactly where it's coming from...
>
> I can confirm this here (WinXP SP2).

It's coming from direct dependencies on user32.dll (from which we use
wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
and is allocated when ResumeThread() is called to kickstart the new
backend, but before the backend actually does anything (proven with a
while(1) loop in main() for the -forkbackend case with a breakpoint on
ResumeThread() in the postmaster).

I've submitted a patch against 8.3 that removes these dependencies
altogether. Unfortuntely, it seems we still have indirect dependencies
on user32.dll which I don't believe we can do anything about. In
testing, the patch reduces the per-connection desktop heap usage from
arount 9.7KB to 3.2KB which is back in line with 8.2.

Regards, Dave

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Harald Armin Massa"

Date:

23 October 2007, 12:32:48

Dave,

> It's coming from direct dependencies on user32.dll (from which we use
> wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
> and is allocated when ResumeThread() is called to kickstart the new
> backend,

why does every backend need its own heap for user32.dll or
shell32.dll? Wasn't the point of shared dlls to be shared?

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Gregory Stark

Date:

23 October 2007, 12:49:49

"Harald Armin Massa" <haraldarminmassa@gmail.com> writes:

> Dave,
>
>> It's coming from direct dependencies on user32.dll (from which we use
>> wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
>> and is allocated when ResumeThread() is called to kickstart the new
>> backend,
>
> why does every backend need its own heap for user32.dll or
> shell32.dll? Wasn't the point of shared dlls to be shared?

The Desktop Heap appears to be a place for processes belonging to the same
"desktop" to allocate shared objects such as GUI elements. These are allocated
in shared space so they can be manipulated by any process running in that
"desktop".

Why Shell32 and User32 are allocating space in there just to initialize
themselves or handle these basic utility functions is a bit of a mystery.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

23 October 2007, 12:58:01

Harald Armin Massa wrote:
> Dave,
>
>> It's coming from direct dependencies on user32.dll (from which we use
>> wsprintf()) and shell32.dll (from which we use SHGetSpecialFolderPath())
>> and is allocated when ResumeThread() is called to kickstart the new
>> backend,
>
> why does every backend need its own heap for user32.dll or
> shell32.dll? Wasn't the point of shared dlls to be shared?

No idea, and I thought so.

It's quite easy to prove using the test program attached. Just monitor
the desktop heap with dheapmon (from Microsoft's website), and run the
program with a single command line argument to get it to spawn a 100
child processes. You can stop it loading various DLLs by commenting out
the dummy calls to functions in them and rebuilding.

Of course, none of this would be an issue if we made the backend
multithreaded. :-)

I'll get my coat...

/D

#include <stdio.h>
#include <Windows.h>
#include <winsock.h>
#define SECURITY_WIN32
#include <Security.h>
#include <shlobj.h>

int main(int argc, char *argv[])
{
    // Dummy functions to force linking to specific libs

    // user32.lib
    IsCharAlpha('a');

    // wsock32.lib
    WSADATA    wsaData;
    WSAStartup(MAKEWORD(1, 1), &wsaData);

    // secur32.lib
    char un[30];
    DWORD dwUNLen = 30;
    GetUserNameExA(NameUserPrincipal, un, &dwUNLen);

    // advapi32.dll
    char un2[30];
    DWORD dwUN2Len = 30;
    GetUserNameA(un2, &dwUN2Len);

    // shell32.dll
    IsUserAnAdmin();

    // Used by child processes
    if (argc == 1)
    {
        while (1)
        {
            printf("Foo\n");
            Sleep(2000);
        }
    }
    else
    {
        for (int x=0; x<100; x++)
        {
            STARTUPINFOA si;
            PROCESS_INFORMATION pi;

            memset(&pi, 0, sizeof(pi));
            memset(&si, 0, sizeof(si));
            si.cb = sizeof(si);

            /*
             * Create the subprocess in a suspended state. This will be resumed later,
             * once we have written out the parameter file.
             */
            printf("Creating process %d...\n", x);
            if (!CreateProcessA(NULL, argv[0], NULL, NULL, TRUE, CREATE_SUSPENDED,
                               NULL, NULL, &si, &pi))
            {
                printf("CreateProcess call failed: %m (error code %d)", (int) GetLastError());
                return -1;
            }

            printf("Resuming thread %d...\n", x);
            if (ResumeThread(pi.hThread) == -1)
            {
                if (!TerminateProcess(pi.hProcess, 255))
                {
                    printf("could not terminate unstartable process: error code %d", (int) GetLastError());
                    CloseHandle(pi.hProcess);
                    CloseHandle(pi.hThread);
                    return -1;
                }
                CloseHandle(pi.hProcess);
                CloseHandle(pi.hThread);
                printf("could not resume thread of unstarted process: error code %d", (int) GetLastError());
                return -1;
            }
        }
    }

    return 0;
}

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Harald Armin Massa"

Date:

23 October 2007, 13:11:25

> > why does every backend need its own heap for user32.dll or
> > shell32.dll? Wasn't the point of shared dlls to be shared?
>
> The Desktop Heap appears to be a place for processes belonging to the same
> "desktop" to allocate shared objects such as GUI elements. These are allocated
> in shared space so they can be manipulated by any process running in that
> "desktop".

Using this knowledge and Daves response, also looking back at "3,2kb
per backend", I stumbled upon that KB entry:

http://support.microsoft.com/?scid=kb%3Ben-us%3B184802&x=15&y=14

Please pay special attention to the following parts:

%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows
   SharedSection=1024,3072,512 Windows=On SubSystemType=Windows
   ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3
   ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off
   MaxRequestThreads=16

"""The second SharedSection value (3072) is the size of the desktop
heap for each desktop that is associated with the "interactive" window
station WinSta0."""

and further down:

"""All services that are executed under the LocalSystem account with
the Allow Service to Interact with Desktop startup option selected
will use "Winsta0\Default". All these processes will share the same
desktop heap associated with the "Default" application desktop."""

Postgres is definitely NOT started as LocalSystem account; so using a
"logical not" on Microsofts Words that could indicate the reason why
our service-backends consume that memory? Add to this that MS SQL runs
as LocalSystem; and as much as I know also Oracle.

Is this a path of thinking to try?

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Harald Armin Massa"

Date:

23 October 2007, 13:14:04

Replying to myself....
> Postgres is definitely NOT started as LocalSystem account; so using a
> "logical not" on Microsofts Words that could indicate the reason why
> our service-backends consume that memory? Add to this that MS SQL runs
> as LocalSystem; and as much as I know also Oracle.

just some lines further down:

"""Every service process executed under a user account will receive a
new desktop in a noninteractive window station created by the Service
Control Manager (SCM). Thus, each service executed under a user
account will consume the number of kilobytes of desktop heap specified
in the third SharedSection value. All services executed under the
LocalSystem account with Allow Service to Interact with the Desktop
not selected share the desktop heap of the "Default" desktop in the
noninteractive service windows station (Service-0x0-3e7$)."""


it is exactly as suspected ... just starting the service allocates that heap

Harald


--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

23 October 2007, 13:22:26

Harald Armin Massa wrote:
> Replying to myself....
>> Postgres is definitely NOT started as LocalSystem account; so using a
>> "logical not" on Microsofts Words that could indicate the reason why
>> our service-backends consume that memory? Add to this that MS SQL runs
>> as LocalSystem; and as much as I know also Oracle.
>
> just some lines further down:
>
> """Every service process executed under a user account will receive a
> new desktop in a noninteractive window station created by the Service
> Control Manager (SCM). Thus, each service executed under a user
> account will consume the number of kilobytes of desktop heap specified
> in the third SharedSection value. All services executed under the
> LocalSystem account with Allow Service to Interact with the Desktop
> not selected share the desktop heap of the "Default" desktop in the
> noninteractive service windows station (Service-0x0-3e7$)."""
>
>
> it is exactly as suspected ... just starting the service allocates that heap

You're missing the point I think. There's 48MB (iirc) on XP that is
reserved for desktop heaps. From that, it allocates 64KB for
WinSta0\Disconnect, 128KB for WinSta0\Winlogon and 3072KB for
WinSta0\Default (ie. the regular desktop). Each additional session
started by the SCM gets allocated the non-interactive default of 512KB.

It's not the 48MB we're running out of, it's the 512KB. That's why if
you look back in the thread, you'll see I found 8.3 was crashing with 46
connections when running as a service, but with much higher numbers of
connections when run from the logon session.

The reason why Microsoft services don't consume so much heap is that
they are multi-threaded, not multi-process so they don't init user32.dll
etc. for each individual connection like we do, but only once for the
whole server.

/D

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

23 October 2007, 13:43:56

On 10/23/07, Harald Armin Massa <haraldarminmassa@gmail.com> wrote:

> > The Desktop Heap appears to be a place for processes belonging to the same
> > "desktop" to allocate shared objects such as GUI elements. These are allocated
> > in shared space so they can be manipulated by any process running in that
> > "desktop".
>
> Using this knowledge and Daves response, also looking back at "3,2kb
> per backend", I stumbled upon that KB entry:
>
> http://support.microsoft.com/?scid=kb%3Ben-us%3B184802&x=15&y=14

[...]

> Postgres is definitely NOT started as LocalSystem account; so using a
> "logical not" on Microsofts Words that could indicate the reason why
> our service-backends consume that memory? Add to this that MS SQL runs
> as LocalSystem; and as much as I know also Oracle.

It's not quite what you think.  The link Rainer posted upthread does a
decent job describing it, although there's still some room for
confusion: http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx

The hierarchy of containers goes Session, Window Station, Desktop.
Everything relevant is under the same Session, so I'll ignore that for
now.

The console gets a Window Station; this is the interactive one since
the user sitting down works with it directly.  It normally contains
one Desktop of interest (Default), which is what the user actually
sees.  (It's possible to create multiple desktops as a framework for a
"virtual desktop" type of thing, but that's all third-party stuff.)

Each service registered with the Service Manager has a specific
account it logs in under.  For each account, the Service Manager
creates a Window Station to contain it, and all services using the
same account share the default Desktop inside it.  Most services run
under one of the 3 canned accounts, which is what that KB article is
talking about with the Local System bit.

Each Desktop created has a fixed-size chunk of memory allocated to it.
 Desktops created under the interactive Window Station get the larger
chunk of memory (3072KB) since they expect to contain lots of UI
stuff.  Desktops created under other Window Stations get the smaller
chunk of memory (512KB), since they aren't presenting a UI to the
user.

That fixed-size desktop heap is used to track objects handled by the
USER subsystem, which is mostly UI elements like windows and such.
Most of the API interaction for those resources go through user32.dll,
and apparently its initialization procedure grabs some of that heap
space for each process it's loaded into.

The PostgreSQL service is set to log in under its own account, so it
gets its own Window Station, and a default Desktop inside that.  This
is a non-interactive Window Station, so the Desktop gets the smaller
heap.  All postgres.exe processes run in that Desktop and share one
512KB heap.  As each process ends up carving out a chunk of that
space, it uses up all 512KB and fails to create more backends.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

23 October 2007, 14:07:27

On 10/23/07, Rainer Bauer <usenet@munnin.com> wrote:
> "Trevor Talbot" wrote:

> >It could be that there's a significant difference between XP and 2003
> >in how that's handled though.  I do have an XP SP2 machine here with
> >512MB RAM, and I'll try tests on it as soon as I can free up what it's
> >currently occupied with.
>
> Yeah, Win2003 behaves differently accoriding to this source:
> <http://blogs.msdn.com/ntdebugging/archive/2007/01/04/desktop-heap-overview.aspx>
>
> <quote>
> Session paged pool allows session specific paged pool allocations.  Windows XP
> uses regular paged pool, since the number of remote desktop connections is
> limited.  On the other hand, Windows Server 2003 makes allocations from
> session paged pool instead of regular paged pool if Terminal Services
> (application server mode) is installed.
> </quote>

That's a little different.  There's a specific range of kernel VM
space dedicated to session-specific data, so each session references
the same addresses but it can be backed by different physical memory
(same concept as separate processes).  The session paged pool area of
that VM space is used to allocate the individual desktop heaps from.

It's saying that under XP, it's mapped to the main kernel paged pool,
while under 2003 TS it's mapped to session-specific memory, to avoid
depleting the main paged pool.  (Each Terminal Services connection
creates an entire Session.)  It doesn't change how desktop heap is
actually used though, which is the issue we're running into.

The system I'm testing on doesn't have Terminal Services running in
appserver mode.

> After increasing the session heap size in the registry from 512KB to 1024KB
> the no. of connections was roughly doubled. So this might be a solution for
> people running out of Desktop heap.
>
> Alter the value of the following key
> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
>
> The numeric values following "SharedSection=" control the heap management:
> On WinXP these are the default values: "SharedSection=1024,3072,512"
> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
> non-interactive window stations to 1024KB.

It's probably safe to do on a typical XP box, but it's unfortunately
not something you want the installer to do, or even suggest as blanket
advice.  I also wondered about having postmaster create more desktops
on demand, but that has about the same amount of sanity (i.e. not
much).

I think it boils down to getting postgres to avoid using desktop heap
if at all possible, and if not, advising people to avoid XP for high
concurrency, except for suggesting the above change in specific
circumstances.

I suspect win2000 has the same issue, but I don't have a system to
test.  It'd be interesting to know if 2000 Professional behaves any
differently than Server.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Magnus Hagander

Date:

23 October 2007, 15:07:51

Rainer Bauer wrote:
> After increasing the session heap size in the registry from 512KB to 1024KB
> the no. of connections was roughly doubled. So this might be a solution for
> people running out of Desktop heap.
>
> Alter the value of the following key
> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
>
> The numeric values following "SharedSection=" control the heap management:
> On WinXP these are the default values: "SharedSection=1024,3072,512"
> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
> non-interactive window stations to 1024KB.

This part should go in the FAQ, I think. It's valid for 8.2 as well,
from what I can tell, and it's valid for 8.3 both before and after the
patch I just applied.

Dave, you're listed as maintainer :-P

//Magnus

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

26 October 2007, 05:54:01

Magnus Hagander wrote:
> Rainer Bauer wrote:
>> After increasing the session heap size in the registry from 512KB to 1024KB
>> the no. of connections was roughly doubled. So this might be a solution for
>> people running out of Desktop heap.
>>
>> Alter the value of the following key
>> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
>>
>> The numeric values following "SharedSection=" control the heap management:
>> On WinXP these are the default values: "SharedSection=1024,3072,512"
>> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
>> non-interactive window stations to 1024KB.
>
> This part should go in the FAQ, I think. It's valid for 8.2 as well,
> from what I can tell, and it's valid for 8.3 both before and after the
> patch I just applied.
>
> Dave, you're listed as maintainer :-P

done.

/D

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

26 October 2007, 14:37:26

Dave Page wrote:

>Magnus Hagander wrote:
>> Rainer Bauer wrote:
>>> After increasing the session heap size in the registry from 512KB to 1024KB
>>> the no. of connections was roughly doubled. So this might be a solution for
>>> people running out of Desktop heap.
>>>
>>> Alter the value of the following key
>>> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
>>>
>>> The numeric values following "SharedSection=" control the heap management:
>>> On WinXP these are the default values: "SharedSection=1024,3072,512"
>>> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
>>> non-interactive window stations to 1024KB.
>>
>> This part should go in the FAQ, I think. It's valid for 8.2 as well,
>> from what I can tell, and it's valid for 8.3 both before and after the
>> patch I just applied.
>>
>> Dave, you're listed as maintainer :-P
>
>done.

Dave could you add that it's the third parameter of the "SharedSection" string
that must be changed. I read that KB article, but still had to find the
correct one by trial and error, which required a reboot every time.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Dave Page"

Date:

26 October 2007, 15:15:12

> ------- Original Message -------
> From: Rainer Bauer <usenet@munnin.com>
> To: pgsql-general@postgresql.org
> Sent: 26/10/07, 18:09:26
> Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
>
> Dave could you add that it's the third parameter of the "SharedSection" string
> that must be changed. I read that KB article, but still had to find the
> correct one by trial and error, which required a reboot every time.

Err, it does say that:

You can increase the non-interactive Desktop Heap by modifying the third SharedSection value in the registry as
describedin this Microsoft Knowledgebase article. 

/D

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Rainer Bauer

Date:

26 October 2007, 16:37:30

"Dave Page" wrote:

>> ------- Original Message -------
>> From: Rainer Bauer <usenet@munnin.com>
>> To: pgsql-general@postgresql.org
>> Sent: 26/10/07, 18:09:26
>> Subject: Re: [GENERAL] 8.2.3: Server crashes on Windows using Eclipse/Junit
>>
>> Dave could you add that it's the third parameter of the "SharedSection" string
>> that must be changed. I read that KB article, but still had to find the
>> correct one by trial and error, which required a reboot every time.
>
>Err, it does say that:
>
>You can increase the non-interactive Desktop Heap by modifying the third SharedSection value in the registry as
describedin this Microsoft Knowledgebase article. 

Must have overlooked that part. Sorry for the noise.

Rainer

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Laurent Duperval

Date:

29 November 2007, 21:01:08

Hi,

I'm not sure if this is good netiquette, or not. I'm reviving a month-old
thread, because I'm trying to figure out how to resolve the issue.

To summarize: when I run unit tests with eclipse (and with Ant) on
Windows, at some point, I run out of available connections. I tried
increasing the maximum number of connections, but then I started seeing
the postgres server die and restart.

I'm trying to fix this, yet again, but I don't have a clear idea of what
to fix.

On Tue, 23 Oct 2007 20:07:22 +0200, Magnus Hagander wrote:
> Rainer Bauer wrote:
>> After increasing the session heap size in the registry from 512KB to 1024KB
>> the no. of connections was roughly doubled. So this might be a solution for
>> people running out of Desktop heap.
>>
>> Alter the value of the following key
>> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
>>
>> The numeric values following "SharedSection=" control the heap management:
>> On WinXP these are the default values: "SharedSection=1024,3072,512"
>> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
>> non-interactive window stations to 1024KB.
>

Does this allow creating more connections? At some point, the discussion
became too technical for me, and I no longer could tell if the answer was
for developers of for users.

I saw other messages dealing with semaphores/connection relations, etc.
But unless I really did not understand the discussion, none of them seemed
to address the issue I was seeing.

I'm thinking that the Java driver combined with Hibernate may be keeping
handles open for too long, because my tests aren't supposed to maintain
connections open for very long. I also would expect the connections to
either be closed or released once the statements are executed.

> This part should go in the FAQ, I think. It's
valid for 8.2 as well,
> from what I can tell, and it's valid for 8.3 both before and after the
> patch I just applied.
>
> Dave, you're listed as maintainer :-P
>
> //Magnus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/

--
Prenez la parole en public en étant         Speak to an audience while being
moins nerveux et plus convaincant!         less nervous and more convincing!
Abonnez-vous au bulletin gratuit!           Sign up for the free newsletter!

                 http://www.duperval.com   (514) 902-0186

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

"Trevor Talbot"

Date:

30 November 2007, 01:51:10

On 11/29/07, Laurent Duperval <lduperval@yahoo.com> wrote:

> To summarize: when I run unit tests with eclipse (and with Ant) on
> Windows, at some point, I run out of available connections. I tried
> increasing the maximum number of connections, but then I started seeing
> the postgres server die and restart.

The conclusion was that under Windows XP, postgres is normally limited
to a maximum of 125-150 connections. Raising max_connections higher
than that will lead to the crashes you saw.

> > Rainer Bauer wrote:
> >> After increasing the session heap size in the registry from 512KB to 1024KB
> >> the no. of connections was roughly doubled. So this might be a solution for
> >> people running out of Desktop heap.
> >>
> >> Alter the value of the following key
> >> <HKLM\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows>
> >>
> >> The numeric values following "SharedSection=" control the heap management:
> >> On WinXP these are the default values: "SharedSection=1024,3072,512"
> >> Altering this to "SharedSection=1024,3072,1024" will increase the heap for all
> >> non-interactive window stations to 1024KB.

> Does this allow creating more connections? At some point, the discussion
> became too technical for me, and I no longer could tell if the answer was
> for developers of for users.

Yes. After making that change and restarting Windows, postgres will be
able to safely handle 250-300 connections.

> I saw other messages dealing with semaphores/connection relations, etc.
> But unless I really did not understand the discussion, none of them seemed
> to address the issue I was seeing.

Right, we were just trying to find the precise resource limit that was
causing the crash.

> I'm thinking that the Java driver combined with Hibernate may be keeping
> handles open for too long, because my tests aren't supposed to maintain
> connections open for very long. I also would expect the connections to
> either be closed or released once the statements are executed.

This is where I would start on your problem. Increasing the max
connections is one thing, but having so very many simultaneous
operations in progress on your database is probably not productive, as
it's likely to spend more time juggling tasks than actually performing
them.

I'm not familiar with Java tools, so someone else will have to chime
in with specific suggestions. It may be something as simple as
limiting how many tests JUnit/Ant tries to run at the same time, or
some parameter buried in Hibernate or the driver.

Re: 8.2.3: Server crashes on Windows using Eclipse/Junit

From

Dave Page

Date:

30 November 2007, 05:51:27

Laurent Duperval wrote:
> Does this allow creating more connections? At some point, the discussion
> became too technical for me, and I no longer could tell if the answer was
> for developers of for users.

Yeah, it did become something of an investigation into the problem which
probably should have been moved to -hackers.

I summarised the info in the FAQ
http://www.postgresql.org/docs/faqs.FAQ_windows.html#4.4 for user
consumption, and included a link to the MS Knowledgebase article that
shows what to tweak in the registry.

> I saw other messages dealing with semaphores/connection relations, etc.
> But unless I really did not understand the discussion, none of them seemed
> to address the issue I was seeing.

Yes, that was all about how we were using threads to manage interprocess
communications. We found a far more efficient way to do that, but my
guess is that thats not your problem.

> I'm thinking that the Java driver combined with Hibernate may be keeping
> handles open for too long, because my tests aren't supposed to maintain
> connections open for very long. I also would expect the connections to
> either be closed or released once the statements are executed.

That could be an issue with Hibernate or the other code you're running,
but yes, if it's opening lots of connections and keeping them open that
could be what's wrong and I would suggest checking the FAQ above.

Regards, Dave