Thread: What is happening on buildfarm member baiji?

What is happening on buildfarm member baiji?

From

Tom Lane

Date:

12 May 2007, 23:06:20

The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray).  Given that the previous "check" step
passed, the most likely explanation seems to be that some part
of the "install" step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten.  So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

"Andrew Dunstan"

Date:

13 May 2007, 01:58:30

Tom Lane wrote:
> The last two runs on baiji have failed at the installcheck stage,
> with symptoms that look a heck of a lot like the most recent system
> catalog changes haven't taken effect (eg, it doesn't seem to know
> about pg_type.typarray).  Given that the previous "check" step
> passed, the most likely explanation seems to be that some part
> of the "install" step failed --- I've not tried to reproduce the
> behavior but it looks like it might be explained if the install
> target's postgres.bki file was not getting overwritten.  So we
> have two issues: what exactly is going wrong (some new form of
> Vista brain death no doubt), and why isn't the buildfarm script
> noticing?
>


The script will not even run if the install directory exists:
 die "$buildroot/$branch has $pgsql or inst directories!"if ((!$from_source && -d $pgsql) || -d "inst");

But the install process is different for MSVC. It could be that we are
screwing up there.

I no longer have an MSVC box, so I can't tell so easily ;-(

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

13 May 2007, 07:34:54

Andrew Dunstan wrote:
> Tom Lane wrote:
>> The last two runs on baiji have failed at the installcheck stage,
>> with symptoms that look a heck of a lot like the most recent system
>> catalog changes haven't taken effect (eg, it doesn't seem to know
>> about pg_type.typarray).  Given that the previous "check" step
>> passed, the most likely explanation seems to be that some part
>> of the "install" step failed --- I've not tried to reproduce the
>> behavior but it looks like it might be explained if the install
>> target's postgres.bki file was not getting overwritten.  So we
>> have two issues: what exactly is going wrong (some new form of
>> Vista brain death no doubt), and why isn't the buildfarm script
>> noticing?
>>
> 
> 
> The script will not even run if the install directory exists:
> 
>   die "$buildroot/$branch has $pgsql or inst directories!"
>     if ((!$from_source && -d $pgsql) || -d "inst");
> 
> But the install process is different for MSVC. It could be that we are
> screwing up there.

Uh, but that piece of code you're referring to is from the bulidfarm
code, right? Isn't it the same?


> I no longer have an MSVC box, so I can't tell so easily ;-(

Non-Vista MSVC boxes seem to pass fine (mastodon and skylark, for
example - skylark fails on something completely different, not fully
investigated yet, but looks to be a buildfarm problem rather than a
backend one), so I don't think it's the MSVC procedure alone that's the
cause of it.

//Magnus

Re: What is happening on buildfarm member baiji?

From

"Andrew Dunstan"

Date:

13 May 2007, 08:51:45

Magnus Hagander wrote:
> Andrew Dunstan wrote:
>> Tom Lane wrote:
>>> The last two runs on baiji have failed at the installcheck stage,
>>> with symptoms that look a heck of a lot like the most recent system
>>> catalog changes haven't taken effect (eg, it doesn't seem to know
>>> about pg_type.typarray).  Given that the previous "check" step
>>> passed, the most likely explanation seems to be that some part
>>> of the "install" step failed --- I've not tried to reproduce the
>>> behavior but it looks like it might be explained if the install
>>> target's postgres.bki file was not getting overwritten.  So we
>>> have two issues: what exactly is going wrong (some new form of
>>> Vista brain death no doubt), and why isn't the buildfarm script
>>> noticing?
>>>
>>
>>
>> The script will not even run if the install directory exists:
>>
>>   die "$buildroot/$branch has $pgsql or inst directories!"
>>     if ((!$from_source && -d $pgsql) || -d "inst");
>>
>> But the install process is different for MSVC. It could be that we are
>> screwing up there.
>
> Uh, but that piece of code you're referring to is from the bulidfarm
> code, right? Isn't it the same?


Yes, but it might be that the MSVC install doesn't actually use that
location properly. Unfortunately, its logging is less than verbose, unlike
the standard install procedure.

>
>
>> I no longer have an MSVC box, so I can't tell so easily ;-(
>
> Non-Vista MSVC boxes seem to pass fine (mastodon and skylark, for
> example - skylark fails on something completely different, not fully
> investigated yet, but looks to be a buildfarm problem rather than a
> backend one), so I don't think it's the MSVC procedure alone that's the
> cause of it.
>
>

Possibly. My point was that I can't even investigate how MSVC is working
at all.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

13 May 2007, 10:01:16

Andrew Dunstan wrote:
> Magnus Hagander wrote:
>> Andrew Dunstan wrote:
>>> Tom Lane wrote:
>>>> The last two runs on baiji have failed at the installcheck stage,
>>>> with symptoms that look a heck of a lot like the most recent system
>>>> catalog changes haven't taken effect (eg, it doesn't seem to know
>>>> about pg_type.typarray).  Given that the previous "check" step
>>>> passed, the most likely explanation seems to be that some part
>>>> of the "install" step failed --- I've not tried to reproduce the
>>>> behavior but it looks like it might be explained if the install
>>>> target's postgres.bki file was not getting overwritten.  So we
>>>> have two issues: what exactly is going wrong (some new form of
>>>> Vista brain death no doubt), and why isn't the buildfarm script
>>>> noticing?
>>>>
>>>
>>> The script will not even run if the install directory exists:
>>>
>>>   die "$buildroot/$branch has $pgsql or inst directories!"
>>>     if ((!$from_source && -d $pgsql) || -d "inst");
>>>
>>> But the install process is different for MSVC. It could be that we are
>>> screwing up there.
>> Uh, but that piece of code you're referring to is from the bulidfarm
>> code, right? Isn't it the same?
> 
> 
> Yes, but it might be that the MSVC install doesn't actually use that
> location properly. Unfortunately, its logging is less than verbose, unlike
> the standard install procedure.
> 
>>
>>> I no longer have an MSVC box, so I can't tell so easily ;-(
>> Non-Vista MSVC boxes seem to pass fine (mastodon and skylark, for
>> example - skylark fails on something completely different, not fully
>> investigated yet, but looks to be a buildfarm problem rather than a
>> backend one), so I don't think it's the MSVC procedure alone that's the
>> cause of it.
>>
>>
> 
> Possibly. My point was that I can't even investigate how MSVC is working
> at all.

So what is it you're looking for, specifically, to help with that?

//Magnus

Re: What is happening on buildfarm member baiji?

From

"Andrew Dunstan"

Date:

13 May 2007, 10:15:47

Magnus Hagander wrote:
>> My point was that I can't even investigate how MSVC is working
>> at all.
>
> So what is it you're looking for, specifically, to help with that?
>
>

As a very bare minimum, we need to change the installation procedure to
log its destination.

Unless that has somehow got screwed up I can't see how Tom's theory of a
possibly left over .bki file can stand up.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

13 May 2007, 12:22:27

Andrew Dunstan wrote:
> Magnus Hagander wrote:
>>> My point was that I can't even investigate how MSVC is working
>>> at all.
>> So what is it you're looking for, specifically, to help with that?
>>
>>
> 
> As a very bare minimum, we need to change the installation procedure to
> log its destination.
> 
> Unless that has somehow got screwed up I can't see how Tom's theory of a
> possibly left over .bki file can stand up.

Just to be clear, are you looking for something as simple as this?

Index: Install.pm
===================================================================
RCS file: /cvsroot/pgsql/src/tools/msvc/Install.pm,v
retrieving revision 1.14
diff -c -r1.14 Install.pm
*** Install.pm  25 Apr 2007 19:00:05 -0000      1.14
--- Install.pm  13 May 2007 15:21:51 -0000
***************
*** 35,41 ****         $conf = "release";     }     die "Could not find debug or release binaries" if ($conf eq "");
!     print "Installing for $conf\n";
     EnsureDirectories($target,
'bin','lib','share','share/timezonesets','share/contrib','doc',         'doc/contrib', 'symbols');
--- 35,41 ----         $conf = "release";     }     die "Could not find debug or release binaries" if ($conf eq "");
!     print "Installing for $conf in $target\n";
     EnsureDirectories($target,
'bin','lib','share','share/timezonesets','share/contrib','doc',         'doc/contrib', 'symbols');



//Magnus

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

13 May 2007, 12:29:29


Magnus Hagander wrote:
> !     print "Installing for $conf in $target\n";
>
>       
>   

Looks like a good place to start, sure.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

13 May 2007, 12:33:24

Andrew Dunstan wrote:
> 
> 
> Magnus Hagander wrote:
>> !     print "Installing for $conf in $target\n";
>>
>>         
> 
> Looks like a good place to start, sure.

Ok. Applied.

//Magnus

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

13 May 2007, 14:04:07

"Andrew Dunstan" <andrew@dunslane.net> writes:
> Unless that has somehow got screwed up I can't see how Tom's theory of a
> possibly left over .bki file can stand up.

Well, I tried inserting a .bki file from April 30 into a HEAD
installation, and that made it dump core during bootstrap, so that
offhand theory was wrong.

However, when I run the HEAD regression tests against that entire
April 30 installation tree, I can duplicate the baiji regression diffs
almost exactly --- the polymorphism test fails for me where it succeeds
on baiji, which I think indicate that baiji has the patch I applied on
May 1 for SQL function inlining.

So I now state fairly confidently that baiji is failing to overwrite
*any* of the installation tree, /share and /bin both, and instead is
testing an installation dating from sometime between May 1 and May 11.
Have there been any recent changes in either the buildfarm script or
the MSVC install code that might have changed where the install is
supposed to go?
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

13 May 2007, 18:32:34

Tom Lane wrote:
> "Andrew Dunstan" <andrew@dunslane.net> writes:
>   
>> Unless that has somehow got screwed up I can't see how Tom's theory of a
>> possibly left over .bki file can stand up.
>>     
>
> Well, I tried inserting a .bki file from April 30 into a HEAD
> installation, and that made it dump core during bootstrap, so that
> offhand theory was wrong.
>
> However, when I run the HEAD regression tests against that entire
> April 30 installation tree, I can duplicate the baiji regression diffs
> almost exactly --- the polymorphism test fails for me where it succeeds
> on baiji, which I think indicate that baiji has the patch I applied on
> May 1 for SQL function inlining.
>
> So I now state fairly confidently that baiji is failing to overwrite
> *any* of the installation tree, /share and /bin both, and instead is
> testing an installation dating from sometime between May 1 and May 11.
> Have there been any recent changes in either the buildfarm script or
> the MSVC install code that might have changed where the install is
> supposed to go?
>
>   

Not to my knowledge, but I have no method of testing what's going on, 
and I hate guessing like this - in fact this is what has worried me all 
along about supporting MSVC builds - we always said we didn't want to 
have to have 2 build environments, but now we have two and we'll be 
supporting them forever, even though one of them is not used by 95% of 
our developers. I realise that MSVC builds are likely to perform better, 
but we have now got a situation where we are likely to have breakage on 
a regular basis, ISTM.

(sorry to grumble - it's been a very frustrating 24 hours)

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

14 May 2007, 05:06:00

Andrew Dunstan wrote:
> 
> Not to my knowledge, but I have no method of testing what's going on,
> and I hate guessing like this - in fact this is what has worried me all
> along about supporting MSVC builds - we always said we didn't want to
> have to have 2 build environments, but now we have two and we'll be
> supporting them forever, even though one of them is not used by 95% of
> our developers. I realise that MSVC builds are likely to perform better,
> but we have now got a situation where we are likely to have breakage on
> a regular basis, ISTM.

It's not just that they perform better - we also get a debugger that
actually works well (yes, I know newer gdb's apparently do work on
Mingw; but even a fully functional GDB doesn't come close to VC++), but
more importantly it's looking more and more like it'll be our only way
of producing a 64bit build for Windows.

> (sorry to grumble - it's been a very frustrating 24 hours)

:-(

Regards, Dave.

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

14 May 2007, 07:02:01

Tom Lane wrote:

> So I now state fairly confidently that baiji is failing to overwrite
> *any* of the installation tree, /share and /bin both, and instead is
> testing an installation dating from sometime between May 1 and May 11.

Close. There was an Msys build from the 9th running on port 5432.

So, it seems there are a couple of issues here:

1) There appears to be no way to specify the default port number in the
MSVC build. The buildfarm passes it to configure for regular builds,
which obviously isn't run in VC++ mode, thus leaving the build on 5432.

2) VC++ and Msys builds will both happily start on the same port at the
same time. The first one to start listens on 5432 until it shuts down,
at which point the second server takes over seamlessly! It doesn't
matter which is started first - it's as if Windows is queuing up the
listens on the port.

Confusingly, the similar behaviour is reproducible on XP Pro, except theconnection seems to go to the last server
started,instead of the first!

Regards, Dave

Re: What is happening on buildfarm member baiji?

From

"Zeugswetter Andreas ADI SD"

Date:

14 May 2007, 07:56:33

> Close. There was an Msys build from the 9th running on port 5432.

> 2) VC++ and Msys builds will both happily start on the same
> port at the same time. The first one to start listens on 5432
> until it shuts down, at which point the second server takes
> over seamlessly! It doesn't matter which is started first -
> it's as if Windows is queuing up the listens on the port.

Um, we explicitly set SO_REUSEADDR. So the port will happily allow a
second bind.

http://support.microsoft.com/kb/307175 quote:
"If you use SO_REUSADDR to bind multiple servers to the same port at the
same time, only one random listening socket accepts a connection
request."

Andreas

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

14 May 2007, 08:47:31

Zeugswetter Andreas ADI SD wrote:
>> Close. There was an Msys build from the 9th running on port 5432.
> 
>> 2) VC++ and Msys builds will both happily start on the same 
>> port at the same time. The first one to start listens on 5432 
>> until it shuts down, at which point the second server takes 
>> over seamlessly! It doesn't matter which is started first - 
>> it's as if Windows is queuing up the listens on the port.
> 
> Um, we explicitly set SO_REUSEADDR. So the port will happily allow a
> second bind.

So we do. I must confess I didn't look at the code, just spoke with
Magnus who agreed it didn't seem like it should be possible.

Regards, Dave

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

14 May 2007, 09:09:41


Dave Page wrote:
> Tom Lane wrote:
>
>   
>> So I now state fairly confidently that baiji is failing to overwrite
>> *any* of the installation tree, /share and /bin both, and instead is
>> testing an installation dating from sometime between May 1 and May 11.
>>     
>
> Close. There was an Msys build from the 9th running on port 5432.
>
> So, it seems there are a couple of issues here:
>
> 1) There appears to be no way to specify the default port number in the
> MSVC build. The buildfarm passes it to configure for regular builds,
> which obviously isn't run in VC++ mode, thus leaving the build on 5432.
>
> 2) VC++ and Msys builds will both happily start on the same port at the
> same time. The first one to start listens on 5432 until it shuts down,
> at which point the second server takes over seamlessly! It doesn't
> matter which is started first - it's as if Windows is queuing up the
> listens on the port.
>
> Confusingly, the similar behaviour is reproducible on XP Pro, except the
>  connection seems to go to the last server started, instead of the first!
>
>
>   

I'll look at the port mess.

Are you running 2 buildfarm members on the same machine? If so, you 
should look at using the multi-root factility which is explicitly 
designed to avoid clashes of this sort.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 09:29:25

Dave Page <dpage@postgresql.org> writes:
> 2) VC++ and Msys builds will both happily start on the same port at the
> same time. The first one to start listens on 5432 until it shuts down,
> at which point the second server takes over seamlessly!

Uh ... so the lock-file stuff is completely broken on Windows?

The SO_REUSEADDR flag is intentional --- without that, on many
platforms there would be a significant time delay needed between
stopping a postmaster and starting a new one.  But our socket lock
file machinery ought to have detected the conflict.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

14 May 2007, 09:33:21

Andrew Dunstan wrote:
> 
> 
> I'll look at the port mess.
> 
> Are you running 2 buildfarm members on the same machine? If so, you
> should look at using the multi-root factility which is explicitly
> designed to avoid clashes of this sort.

Yes, I've got VC++ and Mingw/Msys animals on each of two (virtual)
machines. Each is completely independent of each other - different
configs, different scripts, different ports, different directories etc.

Where can I find out about multi-root? I can't see anything in the
config file, or in PGBuildFarm-HOWTO.txt

Regards, Dave.

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 09:51:36

I wrote:
> Uh ... so the lock-file stuff is completely broken on Windows?

Not so much broken as commented out ... on looking at the code, it's
blindingly obvious that we don't even try to create a socket lock file
if not HAVE_UNIX_SOCKETS.  Sigh.

There is a related risk even on Unix machines: two postmasters can be
started on the same port number if they have different settings of
unix_socket_directory, and then it's indeterminate which one you will
contact if you connect to the TCP port.  I seem to recall that we
discussed this several years ago, and didn't really find a satisfactory
way of interlocking the TCP port per se.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

14 May 2007, 09:56:43

On Mon, May 14, 2007 at 08:50:54AM -0400, Tom Lane wrote:
> I wrote:
> > Uh ... so the lock-file stuff is completely broken on Windows?
> 
> Not so much broken as commented out ... on looking at the code, it's
> blindingly obvious that we don't even try to create a socket lock file
> if not HAVE_UNIX_SOCKETS.  Sigh.
> 
> There is a related risk even on Unix machines: two postmasters can be
> started on the same port number if they have different settings of
> unix_socket_directory, and then it's indeterminate which one you will
> contact if you connect to the TCP port.  I seem to recall that we
> discussed this several years ago, and didn't really find a satisfactory
> way of interlocking the TCP port per se.

If all we want to do is add a check that prevents two servers to start on
the same port, we could do that trivially in a win32 specific way (since
we'll never have unix sockets there). Just create an object in the global
namespace named postgresql.interlock.<portnumber> or such a thing.

Worth doing?

//Magnus

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 10:02:52

Magnus Hagander <magnus@hagander.net> writes:
> If all we want to do is add a check that prevents two servers to start on
> the same port, we could do that trivially in a win32 specific way (since
> we'll never have unix sockets there). Just create an object in the global
> namespace named postgresql.interlock.<portnumber> or such a thing.

Does it go away automatically on postmaster crash?
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Stephen Frost

Date:

14 May 2007, 10:06:28

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> There is a related risk even on Unix machines: two postmasters can be
> started on the same port number if they have different settings of
> unix_socket_directory, and then it's indeterminate which one you will
> contact if you connect to the TCP port.  I seem to recall that we
> discussed this several years ago, and didn't really find a satisfactory
> way of interlocking the TCP port per se.

I'm curious as to which Unix systems allow multiple processes to listen
on the same port at the same time..  On Linux, and I thought on most,
you get an EADDRINUSE on the listen() call (which the postmaster should
pick up on and bomb out, which it may already).
Thanks,
    Stephen

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

14 May 2007, 10:15:59

Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> There is a related risk even on Unix machines: two postmasters can be
>> started on the same port number if they have different settings of
>> unix_socket_directory, and then it's indeterminate which one you will
>> contact if you connect to the TCP port.  I seem to recall that we
>> discussed this several years ago, and didn't really find a satisfactory
>> way of interlocking the TCP port per se.
> 
> I'm curious as to which Unix systems allow multiple processes to listen
> on the same port at the same time..  On Linux, and I thought on most,
> you get an EADDRINUSE on the listen() call (which the postmaster should
> pick up on and bomb out, which it may already).

Linux certainly does. Windows seems to treat SO_REUSEADDR in the same
way as SO_REUSEPORT which just seems wrong.

Regards, Dave.

Re: What is happening on buildfarm member baiji?

From

Gregory Stark

Date:

14 May 2007, 10:16:57

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> I wrote:
>> Uh ... so the lock-file stuff is completely broken on Windows?
>
> Not so much broken as commented out ... on looking at the code, it's
> blindingly obvious that we don't even try to create a socket lock file
> if not HAVE_UNIX_SOCKETS.  Sigh.

Isn't the socket lock file only there to protect the socket?

> There is a related risk even on Unix machines: two postmasters can be
> started on the same port number if they have different settings of
> unix_socket_directory, and then it's indeterminate which one you will
> contact if you connect to the TCP port.  I seem to recall that we
> discussed this several years ago, and didn't really find a satisfactory
> way of interlocking the TCP port per se.

stark@oxford:~/src/local-concurrent-psql/pgsql/src/bin/psql$ /usr/local/pgsql/bin/postgres -D /var/tmp/db2
LOG:  could not bind IPv4 socket: Address already in use
HINT:  Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
WARNING:  could not create listen socket for "localhost"
FATAL:  could not create any TCP/IP sockets

Is it possible the previous discussion related to servers with IPv6 where they
did manage to bind to one but not the other?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

14 May 2007, 10:20:00

On Mon, May 14, 2007 at 09:02:10AM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > If all we want to do is add a check that prevents two servers to start on
> > the same port, we could do that trivially in a win32 specific way (since
> > we'll never have unix sockets there). Just create an object in the global
> > namespace named postgresql.interlock.<portnumber> or such a thing.
> 
> Does it go away automatically on postmaster crash?

Yes.

//Magnus

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

14 May 2007, 10:34:20


Magnus Hagander wrote:
> On Mon, May 14, 2007 at 09:02:10AM -0400, Tom Lane wrote:
>   
>> Magnus Hagander <magnus@hagander.net> writes:
>>     
>>> If all we want to do is add a check that prevents two servers to start on
>>> the same port, we could do that trivially in a win32 specific way (since
>>> we'll never have unix sockets there). Just create an object in the global
>>> namespace named postgresql.interlock.<portnumber> or such a thing.
>>>       
>> Does it go away automatically on postmaster crash?
>>     
>
> Yes.
>
>
>   

Then I think it's worth adding, and I'd argue that as a low risk safety 
measure we should allow it to sneak into 8.3. I'm assuming the code 
involved will be quite small.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 10:41:52

Dave Page <dpage@postgresql.org> writes:
> Stephen Frost wrote:
>> I'm curious as to which Unix systems allow multiple processes to listen
>> on the same port at the same time..  On Linux, and I thought on most,
>> you get an EADDRINUSE on the listen() call (which the postmaster should
>> pick up on and bomb out, which it may already).

> Linux certainly does.

Mmm, you're right, I misread the man page:
    Setting the SO_REUSEADDR option allows the local socket address to be    reused in subsequent calls to bind().
Thispermits multiple    SOCK_STREAM sockets to be bound to the same local address, as long as    all existing sockets
withthe desired local address are in a connected    state before bind() is called for a new socket.
 

The bit about "connected state" is relevant here --- a listening socket
isn't connected.  Time for more caffeine.

> Windows seems to treat SO_REUSEADDR in the same
> way as SO_REUSEPORT which just seems wrong.

Well, Microsoft getting standards wrong is no surprise.  So what do we
want to do about it?
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

14 May 2007, 10:48:09


Tom Lane wrote:
>      Setting the SO_REUSEADDR option allows the local socket address to be
>      reused in subsequent calls to bind().  This permits multiple
>      SOCK_STREAM sockets to be bound to the same local address, as long as
>      all existing sockets with the desired local address are in a connected
>      state before bind() is called for a new socket.
>
> The bit about "connected state" is relevant here --- a listening socket
> isn't connected.  Time for more caffeine.
>
>   

That's what I thought it meant. I am glad to see that I am not quite as 
out of date as I thought I must be reading upthread :-)

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

14 May 2007, 10:49:01

Tom Lane wrote:
>> Windows seems to treat SO_REUSEADDR in the same
>> way as SO_REUSEPORT which just seems wrong.
> 
> Well, Microsoft getting standards wrong is no surprise.  So what do we
> want to do about it?

Microsoft did lift that code from BSD many moons ago, so it might be
worth checking if the bug actually originated there.

Assuming it didn't, then Magnus' idea sounds good to me.

Regards, Dave

Re: What is happening on buildfarm member baiji?

From

Alvaro Herrera

Date:

14 May 2007, 10:49:11

Andrew Dunstan wrote:
> 
> 
> Magnus Hagander wrote:
> >On Mon, May 14, 2007 at 09:02:10AM -0400, Tom Lane wrote:
> >  
> >>Magnus Hagander <magnus@hagander.net> writes:
> >>    
> >>>If all we want to do is add a check that prevents two servers to start on
> >>>the same port, we could do that trivially in a win32 specific way (since
> >>>we'll never have unix sockets there). Just create an object in the global
> >>>namespace named postgresql.interlock.<portnumber> or such a thing.
> >>>      
> >>Does it go away automatically on postmaster crash?
> >
> >Yes.
> 
> Then I think it's worth adding, and I'd argue that as a low risk safety 
> measure we should allow it to sneak into 8.3. I'm assuming the code 
> involved will be quite small.

Do you actually mean 8.2 here?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 10:50:29

Andrew Dunstan <andrew@dunslane.net> writes:
> Magnus Hagander wrote:
>>> If all we want to do is add a check that prevents two servers to start on
>>> the same port, we could do that trivially in a win32 specific way (since
>>> we'll never have unix sockets there). Just create an object in the global
>>> namespace named postgresql.interlock.<portnumber> or such a thing.

> Then I think it's worth adding, and I'd argue that as a low risk safety 
> measure we should allow it to sneak into 8.3. I'm assuming the code 
> involved will be quite small.

What happens if we just "#ifndef WIN32" the setsockopt(SO_REUSEADDR)
call?  I believe the reason that's in there is that some platforms will
reject bind() to a previously-used address for a TCP timeout delay after
a previous postmaster quit, but if that doesn't happen on Windows then
maybe all we need is to not set the option.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

14 May 2007, 10:53:51

On Mon, May 14, 2007 at 09:49:47AM -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > Magnus Hagander wrote:
> >>> If all we want to do is add a check that prevents two servers to start on
> >>> the same port, we could do that trivially in a win32 specific way (since
> >>> we'll never have unix sockets there). Just create an object in the global
> >>> namespace named postgresql.interlock.<portnumber> or such a thing.
> 
> > Then I think it's worth adding, and I'd argue that as a low risk safety 
> > measure we should allow it to sneak into 8.3. I'm assuming the code 
> > involved will be quite small.
> 
> What happens if we just "#ifndef WIN32" the setsockopt(SO_REUSEADDR)
> call?  I believe the reason that's in there is that some platforms will
> reject bind() to a previously-used address for a TCP timeout delay after
> a previous postmaster quit, but if that doesn't happen on Windows then
> maybe all we need is to not set the option.

I think that at least used to happen on Windows in earlier versions.

//Magnus

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

14 May 2007, 10:55:01

On Mon, May 14, 2007 at 09:34:05AM -0400, Andrew Dunstan wrote:
>
>
> Magnus Hagander wrote:
> >On Mon, May 14, 2007 at 09:02:10AM -0400, Tom Lane wrote:
> >
> >>Magnus Hagander <magnus@hagander.net> writes:
> >>
> >>>If all we want to do is add a check that prevents two servers to start on
> >>>the same port, we could do that trivially in a win32 specific way (since
> >>>we'll never have unix sockets there). Just create an object in the global
> >>>namespace named postgresql.interlock.<portnumber> or such a thing.
> >>>
> >>Does it go away automatically on postmaster crash?
> >>
> >
> >Yes.
> >
> >
> >
>
> Then I think it's worth adding, and I'd argue that as a low risk safety
> measure we should allow it to sneak into 8.3. I'm assuming the code
> involved will be quite small.

Yes, see attached.

BTW, did you mean 8.2? One typical case where this could happen is in an
upgrade scenario, I think...

//Magnus

Attachment

win32_interlock.patch

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 10:58:56

Magnus Hagander <magnus@hagander.net> writes:
> On Mon, May 14, 2007 at 09:49:47AM -0400, Tom Lane wrote:
>> What happens if we just "#ifndef WIN32" the setsockopt(SO_REUSEADDR)
>> call?  I believe the reason that's in there is that some platforms will
>> reject bind() to a previously-used address for a TCP timeout delay after
>> a previous postmaster quit, but if that doesn't happen on Windows then
>> maybe all we need is to not set the option.

> I think that at least used to happen on Windows in earlier versions.

Well, we'd have to check the behavior of the proposed global object on
every supported Windows version too, so we might as well check the
simpler solution while we're at it.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Gregory Stark

Date:

14 May 2007, 11:00:25

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> What happens if we just "#ifndef WIN32" the setsockopt(SO_REUSEADDR)
> call?  I believe the reason that's in there is that some platforms will
> reject bind() to a previously-used address for a TCP timeout delay after
> a previous postmaster quit, but if that doesn't happen on Windows then
> maybe all we need is to not set the option.

Well it's worth checking. But whereas Windows breaking our understanding of
what SO_REUSEADDR does doesn't actually violate any specification, not having
a TIME_WAIT state at all would certainly violate the TCP spec. So it's
somewhat unlikely that that's what they're doing. But anything's possible.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 11:21:21

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> What happens if we just "#ifndef WIN32" the setsockopt(SO_REUSEADDR)
>> call?  I believe the reason that's in there is that some platforms will
>> reject bind() to a previously-used address for a TCP timeout delay after
>> a previous postmaster quit, but if that doesn't happen on Windows then
>> maybe all we need is to not set the option.

> Well it's worth checking. But whereas Windows breaking our understanding of
> what SO_REUSEADDR does doesn't actually violate any specification, not having
> a TIME_WAIT state at all would certainly violate the TCP spec. So it's
> somewhat unlikely that that's what they're doing. But anything's possible.

This is not a behavior required by the TCP spec AFAICS.  Also, in a
quick test neither Linux nor HPUX appear to need SO_REUSEADDR --- on
both, I can restart the postmaster immediately without it.

[ digs in CVS and archives for awhile... ]  An interesting historical
point is that the SO_REUSEADDR call did not appear in the original
Berkeley Postgres95 sources.  It was added in rev 1.2 of pqcomm.c,
for which the only comment is
Finished merging in src/backend from Dr. George's source tree

so the fact is that that code has undergone approximately 0 specific
peer review.  I'm beginning to wonder if we really need it at all.
I thought I recalled us having discussed the need for it once, but I
cannot find any trace of such a discussion.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 11:58:10

Aidan Van Dyk <aidan@highrise.ca> writes:
> * Tom Lane <tgl@sss.pgh.pa.us> [070514 10:24]:
>> This is not a behavior required by the TCP spec AFAICS.  Also, in a
>> quick test neither Linux nor HPUX appear to need SO_REUSEADDR --- on
>> both, I can restart the postmaster immediately without it.

> Did you have an active connection before restarting?
> In HylaFAX, we had the same situation and went to using SO_REUSEADDR:
>     http://bugs.hylafax.org/show_bug.cgi?id=217

Um, you're right, I hadn't done the test properly.  If I have an open
psql session across TCP and do pg_ctl stop -m fast, then I can't
start a new postmaster until the socket goes out of CLOSE_WAIT state.
Which, if I just leave the psql session sit there, seems to mean
"indefinitely" ... so it's even worse than just a TCP timeout.

So the notion of not using SO_REUSEADDR seems a nonstarter, and we
probably have to go with Magnus' global-object hack.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Aidan Van Dyk

Date:

14 May 2007, 11:58:51

* Tom Lane <tgl@sss.pgh.pa.us> [070514 10:24]:
> This is not a behavior required by the TCP spec AFAICS.  Also, in a
> quick test neither Linux nor HPUX appear to need SO_REUSEADDR --- on
> both, I can restart the postmaster immediately without it.

Did you have an active connection before restarting?

In HylaFAX, we had the same situation and went to using SO_REUSEADDR:http://bugs.hylafax.org/show_bug.cgi?id=217

The problem appears if there *was* a connection, and the server was
stopped.  Then the server can't bind again until the TIME_WAIT
connection goes away.  Using SO_REUSEADDR allows the new server to
listen again right away.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: What is happening on buildfarm member baiji?

From

Gregory Stark

Date:

14 May 2007, 12:03:58

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> What happens if we just "#ifndef WIN32" the setsockopt(SO_REUSEADDR)
>>> call?  I believe the reason that's in there is that some platforms will
>>> reject bind() to a previously-used address for a TCP timeout delay after
>>> a previous postmaster quit, but if that doesn't happen on Windows then
>>> maybe all we need is to not set the option.
>
>> Well it's worth checking. But whereas Windows breaking our understanding of
>> what SO_REUSEADDR does doesn't actually violate any specification, not having
>> a TIME_WAIT state at all would certainly violate the TCP spec. So it's
>> somewhat unlikely that that's what they're doing. But anything's possible.
>
> This is not a behavior required by the TCP spec AFAICS.  Also, in a
> quick test neither Linux nor HPUX appear to need SO_REUSEADDR --- on
> both, I can restart the postmaster immediately without it.

It certainly is, observe on page 55 of RFC 793 for the "Open" call in the
example API:

TIME-WAIT STATE
     Return "error:  connection already exists".


> so the fact is that that code has undergone approximately 0 specific
> peer review.  I'm beginning to wonder if we really need it at all.
> I thought I recalled us having discussed the need for it once, but I
> cannot find any trace of such a discussion.

It's certainly standard in Unix coding to have the server set SO_REUSEADDR and
the client not set it.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 12:29:28

Magnus Hagander <magnus@hagander.net> writes:
> +         sprintf(mutexName,"postgresql.interlock.%i", portNumber);

That won't do; it should be legal for two postmasters to listen on
different IP addresses using the same port number.  So you need to
include some representation of the IP address being bound to.

> +         if (GetLastError() == ERROR_ALREADY_EXISTS)
> +             ereport(FATAL,
> +                     (errcode(ERRCODE_LOCK_FILE_EXISTS),
> +                      errmsg("interlock mutex \"%s\" already exists", mutexName),
> +                      errhint("Is another postgres listening on port %i", portNumber)));

ereport(FATAL) is quite inappropriate here.  Do the same thing that
bind() failure would do, ie, ereport(LOG) and continue the loop.
Also, you probably need to think about cleaning up the mutex in
case one of the later steps of socket-acquisition fails.  We should
only be holding locks on addresses we've successfully bound.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Gregory Stark

Date:

14 May 2007, 12:38:38

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Um, you're right, I hadn't done the test properly.  If I have an open
> psql session across TCP and do pg_ctl stop -m fast, then I can't
> start a new postmaster until the socket goes out of CLOSE_WAIT state.
> Which, if I just leave the psql session sit there, seems to mean
> "indefinitely" ... so it's even worse than just a TCP timeout.

That's still not quite right. Are you running the client and server on the
same machine? Shutting down the server should put its connection in FIN_WAIT1
which would immediately go to FIN_WAIT2 if psql is still reachable. I think
the connection you're seeing in CLOSE_WAIT is the client's end of the
connection.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

14 May 2007, 12:43:37

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Um, you're right, I hadn't done the test properly.  If I have an open
>> psql session across TCP and do pg_ctl stop -m fast, then I can't
>> start a new postmaster until the socket goes out of CLOSE_WAIT state.
>> Which, if I just leave the psql session sit there, seems to mean
>> "indefinitely" ... so it's even worse than just a TCP timeout.

> That's still not quite right. Are you running the client and server on the
> same machine?

Yeah.  The behavior might well be different if they're on different
machines ... but it's moot in any case, since the point is that without
SO_REUSEADDR we have at least an exposure to a TCP-timeout delay before
being able to start a new postmaster.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

14 May 2007, 13:39:18

Dave Page wrote:
>
> Where can I find out about multi-root? I can't see anything in the
> config file, or in PGBuildFarm-HOWTO.txt
>
>
>   

It's a hack I want to get rid of. It's a command-line option:
 --multiroot               = allow several members to use same build root

Of course, at least part of our problem is that the MSVC build is not 
honoring port settings at all (and buildfarm isn't setting the port for 
MSVC anyway). Magnus and I will work on that - it's a serious deficiency.

(refrains from whining again about 2 build systems)

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Andrew - Supernews

Date:

14 May 2007, 14:27:00

On 2007-05-14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aidan Van Dyk <aidan@highrise.ca> writes:
>> * Tom Lane <tgl@sss.pgh.pa.us> [070514 10:24]:
>>> This is not a behavior required by the TCP spec AFAICS.  Also, in a
>>> quick test neither Linux nor HPUX appear to need SO_REUSEADDR --- on
>>> both, I can restart the postmaster immediately without it.
>
>> Did you have an active connection before restarting?
>> In HylaFAX, we had the same situation and went to using SO_REUSEADDR:
>>     http://bugs.hylafax.org/show_bug.cgi?id=217
>
> Um, you're right, I hadn't done the test properly.  If I have an open
> psql session across TCP and do pg_ctl stop -m fast, then I can't
> start a new postmaster until the socket goes out of CLOSE_WAIT state.
> Which, if I just leave the psql session sit there, seems to mean
> "indefinitely" ... so it's even worse than just a TCP timeout.

SO_REUSEADDR is required in all cases where you bind a listening socket
to a specific port number. There are no exceptions to this rule.

This is an artifact of the Berkeley Sockets interface design, not something
inherent in the TCP spec. It arises because the sockets interface separates
the bind() and listen()/connect() calls; if you replace bind/listen/connect
with a single system call, then SO_REUSEADDR becomes unnecessary. (The
behaviour of bind() needs to be different depending on whether it will be
followed by listen() or connect(); this was not well understood by the
original designers of the API, hence the use of SO_REUSEADDR as a klugy
way of saying "I'm going to use listen() on this socket after the bind".)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

14 May 2007, 17:04:01

Andrew Dunstan wrote:
>
>
> Dave Page wrote:
>>
>> Where can I find out about multi-root? I can't see anything in the
>> config file, or in PGBuildFarm-HOWTO.txt
>>
>>
>>   
>
> It's a hack I want to get rid of. It's a command-line option:
>
>  --multiroot               = allow several members to use same build root
>
>
>

I have in fact just removed this in buildfarm CVS tip. That means that 
you can now run as many buildfarm members as you like against a single 
buildroot and they will not trip over each other.

We still have the MSVC port problem to fix though.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

14 May 2007, 21:21:11

Dave Page wrote:
>
> 1) There appears to be no way to specify the default port number in the
> MSVC build. The buildfarm passes it to configure for regular builds,
> which obviously isn't run in VC++ mode, thus leaving the build on 5432.
>
>
>   

I have committed fixes to both pgsql and buildfarm that should in 
combination cure this, I hope. Please test - there might still be loose 
ends hanging around.

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Dave Page

Date:

15 May 2007, 04:59:21

Andrew Dunstan wrote:
> 
> 
> Dave Page wrote:
>>
>> 1) There appears to be no way to specify the default port number in the
>> MSVC build. The buildfarm passes it to configure for regular builds,
>> which obviously isn't run in VC++ mode, thus leaving the build on 5432.
>>
>>
>>   
> 
> I have committed fixes to both pgsql and buildfarm that should in
> combination cure this, I hope. Please test - there might still be loose
> ends hanging around.

OK, thanks.

Regards, Dave.

Re: What is happening on buildfarm member baiji?

From

Bruce Momjian

Date:

17 May 2007, 20:18:22

Are we going to apply this?  I would also like to see a comment added on
why we use SO_REUSEADDR.

---------------------------------------------------------------------------

Magnus Hagander wrote:
> On Mon, May 14, 2007 at 09:34:05AM -0400, Andrew Dunstan wrote:
> > 
> > 
> > Magnus Hagander wrote:
> > >On Mon, May 14, 2007 at 09:02:10AM -0400, Tom Lane wrote:
> > >  
> > >>Magnus Hagander <magnus@hagander.net> writes:
> > >>    
> > >>>If all we want to do is add a check that prevents two servers to start on
> > >>>the same port, we could do that trivially in a win32 specific way (since
> > >>>we'll never have unix sockets there). Just create an object in the global
> > >>>namespace named postgresql.interlock.<portnumber> or such a thing.
> > >>>      
> > >>Does it go away automatically on postmaster crash?
> > >>    
> > >
> > >Yes.
> > >
> > >
> > >  
> > 
> > Then I think it's worth adding, and I'd argue that as a low risk safety 
> > measure we should allow it to sneak into 8.3. I'm assuming the code 
> > involved will be quite small.
> 
> Yes, see attached.
> 
> BTW, did you mean 8.2? One typical case where this could happen is in an
> upgrade scenario, I think...
> 
> //Magnus
> 

[ Attachment, skipping... ]

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

17 May 2007, 20:52:15

Bruce Momjian <bruce@momjian.us> writes:
> Are we going to apply this?

Not in the form submitted so far, but I trust Magnus is working on
fixing it.
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

18 May 2007, 04:31:31

On Thu, May 17, 2007 at 07:51:34PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Are we going to apply this?
> 
> Not in the form submitted so far, but I trust Magnus is working on
> fixing it.

I am. Most likely won't have time to look at it properly until after pgcon,
though.

//Magnus

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

03 June 2007, 18:41:31

Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> +         sprintf(mutexName,"postgresql.interlock.%i", portNumber);
> 
> That won't do; it should be legal for two postmasters to listen on
> different IP addresses using the same port number.  So you need to
> include some representation of the IP address being bound to.
> 
>> +         if (GetLastError() == ERROR_ALREADY_EXISTS)
>> +             ereport(FATAL,
>> +                     (errcode(ERRCODE_LOCK_FILE_EXISTS),
>> +                      errmsg("interlock mutex \"%s\" already exists", mutexName),
>> +                      errhint("Is another postgres listening on port %i", portNumber)));
> 
> ereport(FATAL) is quite inappropriate here.  Do the same thing that
> bind() failure would do, ie, ereport(LOG) and continue the loop.
> Also, you probably need to think about cleaning up the mutex in
> case one of the later steps of socket-acquisition fails.  We should
> only be holding locks on addresses we've successfully bound.

I've done some further research on this on Win32, and I've come up with
the following:

If I set the flag SO_EXCLUSIVEADDRUSE, I get the same behavior as on
Unix: Can only create one postmaster at a time on the same addr/port,
and if I close the backend with a psql session running I can't create a
new one until there is a timeout passed.

However, if I just *skip* setting SO_REUSEADDR completely, things seem
to work the way we want it. I cannot start more than one postmaster on
the same addr/port. If I start a psql, then terminate postmaster, I can
restart a new postmaster right away.

Given this, I propose we simply #ifdef out the SO_REUSEADDR on win32.
Anybody see a problem with this?

(A fairly good reference to read up on the options is at
http://msdn2.microsoft.com/en-us/library/ms740621.aspx - which
specifically talks about the issue seen on Unix as appearing with the
SO_EXCLUSIVEADDRUSE parameter, which agrees with my testresults)

//Magnus

Re: What is happening on buildfarm member baiji?

From

Andrew Dunstan

Date:

03 June 2007, 18:57:17


Magnus Hagander wrote:
>
> However, if I just *skip* setting SO_REUSEADDR completely, things seem
> to work the way we want it. I cannot start more than one postmaster on
> the same addr/port. If I start a psql, then terminate postmaster, I can
> restart a new postmaster right away.
>
> Given this, I propose we simply #ifdef out the SO_REUSEADDR on win32.
> Anybody see a problem with this?
>
>
>   

Is that true even if the backend crashes?

cheers

andrew

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

03 June 2007, 23:44:55

Andrew Dunstan <andrew@dunslane.net> writes:
> Magnus Hagander wrote:
>> Given this, I propose we simply #ifdef out the SO_REUSEADDR on win32.
>> Anybody see a problem with this?

> Is that true even if the backend crashes?

It would take a postmaster crash to make this an issue, and those are
pretty doggone rare.  Not that the question shouldn't be checked, but
we might decide to tolerate the problem if there is one ...
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Tom Lane

Date:

04 June 2007, 00:30:15

Magnus Hagander <magnus@hagander.net> writes:
> Given this, I propose we simply #ifdef out the SO_REUSEADDR on win32.
> Anybody see a problem with this?

> (A fairly good reference to read up on the options is at
> http://msdn2.microsoft.com/en-us/library/ms740621.aspx

Hmm ... if accurate, that page says in words barely longer than one
syllable that Microsoft entirely misunderstands the intended meaning
of SO_REUSEADDR.

It looks like SO_EXCLUSIVEADDRUSE might be a bit closer to the standard
semantics; should we use that instead on Windoze?
        regards, tom lane

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

04 June 2007, 04:37:17

On Sun, Jun 03, 2007 at 11:29:33PM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > Given this, I propose we simply #ifdef out the SO_REUSEADDR on win32.
> > Anybody see a problem with this?
> 
> > (A fairly good reference to read up on the options is at
> > http://msdn2.microsoft.com/en-us/library/ms740621.aspx
> 
> Hmm ... if accurate, that page says in words barely longer than one
> syllable that Microsoft entirely misunderstands the intended meaning
> of SO_REUSEADDR.

Yes, that's how I read it as well.

> It looks like SO_EXCLUSIVEADDRUSE might be a bit closer to the standard
> semantics; should we use that instead on Windoze?

I think you're reading something wrong. The way I read it,
SO_EXCLUSIVEADDRUSE gives us pretty much the same behavior we have on Unix
*without* SO_REUSEADDR. There's a paragraph specificallyi talking about the
problem of restarting a server having to wait for a timeout when using this
switch.

//Magnus

Re: What is happening on buildfarm member baiji?

From

Magnus Hagander

Date:

04 June 2007, 05:51:26

On Sun, Jun 03, 2007 at 10:44:13PM -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > Magnus Hagander wrote:
> >> Given this, I propose we simply #ifdef out the SO_REUSEADDR on win32.
> >> Anybody see a problem with this?
> 
> > Is that true even if the backend crashes?
> 
> It would take a postmaster crash to make this an issue, and those are
> pretty doggone rare.  Not that the question shouldn't be checked, but
> we might decide to tolerate the problem if there is one ...

The closest I can get is a kill -9 on postmaster, and that does work. I
can't start a new postmaster while the old backend is running - because of
the shared memory detection stuff. But the second it's gone I can start a
new one, so it doesn't have that wait-until-timeout behavior.

Since that's expected behavior and there were no other complaints, I think
I'll go ahead an put this one in later today.

//Magnus

Re: What is happening on buildfarm member baiji?

From

"Zeugswetter Andreas ADI SD"

Date:

04 June 2007, 07:02:47

> > > Given this, I propose we simply #ifdef out the SO_REUSEADDR on
win32.

I agree, that this is what we should do.

> > > (A fairly good reference to read up on the options is at
> > > http://msdn2.microsoft.com/en-us/library/ms740621.aspx
> >
> > Hmm ... if accurate, that page says in words barely longer than one
> > syllable that Microsoft entirely misunderstands the intended meaning

> > of SO_REUSEADDR.
>
> Yes, that's how I read it as well.
>
> > It looks like SO_EXCLUSIVEADDRUSE might be a bit closer to the
> > standard semantics; should we use that instead on Windoze?
>
> I think you're reading something wrong. The way I read it,
> SO_EXCLUSIVEADDRUSE gives us pretty much the same behavior we have on
Unix
> *without* SO_REUSEADDR. There's a paragraph specificallyi
> talking about the problem of restarting a server having to
> wait for a timeout when using this switch.

Yup, that switch is no good eighter.

Andreas