Thread: APC/socket fix (final?)

APC/socket fix (final?)

From
Claudio Natoli
Date:
For application to HEAD.

This should take care of most, if not all, cases where a backend can be
interrupted in a blocking socket operation by a signal which itself performs
a socket operation (which interacts badly with our APC-based signal
implementation).

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>



Attachment

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
Thanks. I was getting to that, but hadn't started :-)

Per our discussion off-list, I agree with this method, and the patch
looks fine to me.

//Magnus

> -----Original Message-----
> From: Claudio Natoli [mailto:claudio.natoli@memetrics.com]
> Sent: Thursday, March 25, 2004 2:07 AM
> To: 'pgsql-patches@postgresql.org'
> Subject: [PATCHES] APC/socket fix (final?)
>
>
>
> For application to HEAD.
>
> This should take care of most, if not all, cases where a
> backend can be interrupted in a blocking socket operation by
> a signal which itself performs a socket operation (which
> interacts badly with our APC-based signal implementation).
>
> ---
> Certain disclaimers and policies apply to all email sent from
> Memetrics. For the full text of these disclaimers and policies see
> <a
> href="http://www.memetrics.com/emailpolicy.html">http://www.me
> metrics.com/em
> ailpolicy.html</a>
>
>
>

Re: APC/socket fix (final?)

From
Tom Lane
Date:
Claudio Natoli <claudio.natoli@memetrics.com> writes:
> + #ifdef WIN32
> +     /* Interrupted by socket/APC interaction? */
> +     if (n < 0 && GetLastError() == ERROR_IO_PENDING)
> +         errno = EINTR;
> + #endif

This seems a bit schizophrenic; if you can assign to errno, why can't you
read from it?  Would look more consistent if the code looked like

    if (n < 0 && errno == ERROR_IO_PENDING)
        errno = EINTR;

            regards, tom lane

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
>Claudio Natoli <claudio.natoli@memetrics.com> writes:
>> + #ifdef WIN32
>> +     /* Interrupted by socket/APC interaction? */
>> +     if (n < 0 && GetLastError() == ERROR_IO_PENDING)
>> +         errno = EINTR;
>> + #endif
>
>This seems a bit schizophrenic; if you can assign to errno,
>why can't you
>read from it?  Would look more consistent if the code looked like
>
>    if (n < 0 && errno == ERROR_IO_PENDING)
>        errno = EINTR;
>

The problem is that winsock does not *set* the errno variable in the
case when it's interrupted by an APC. errno is left at 0.
GetLastError() is the win32 replacement for errno. In most cases errno
is set correctly when you use the "unix api functions", but not in this
case (which is arguably a bug, but it's there nevertheless). If you use
"native win32 functions", you get GetLastError() set only.

//Magnus

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:

> Would look more consistent if the code looked like
>
>     if (n < 0 && errno == ERROR_IO_PENDING)
>         errno = EINTR;

It would be more consistent, but unfortunately GetLastError() != errno.

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>

Re: APC/socket fix (final?)

From
Tom Lane
Date:
Claudio Natoli <claudio.natoli@memetrics.com> writes:
> It would be more consistent, but unfortunately GetLastError() != errno.

Yeah, I saw Magnus' explanation.  So essentially this is a workaround
for a bug in Windows' select() emulation.

Rather than hoping that we'll remember to decorate every select() call
with this workaround, would it make sense to use a wrapper function?
I'm loath to invent pg_select() but it might be cleaner than this.

            regards, tom lane

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:
> Rather than hoping that we'll remember to decorate every select() call
> with this workaround, would it make sense to use a wrapper function?
> I'm loath to invent pg_select() but it might be cleaner than this.

We'd also need pg_recv() and pg_send(). Chances are it can happen with every
blocking socket call :-(

Cheers,
Claudio

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>

Re: APC/socket fix (final?)

From
Tom Lane
Date:
Claudio Natoli <claudio.natoli@memetrics.com> writes:
>> I'm loath to invent pg_select() but it might be cleaner than this.

> We'd also need pg_recv() and pg_send(). Chances are it can happen with every
> blocking socket call :-(

Ugh.  Is there a way we can insert a wrapper layer without modifying the
call sites?  I'm thinking of some kind of macro hack, say

    #ifdef WIN32
    #define select(...)  pg_select(...)
    #endif

and then provide a port module that goes roughly like

    #undef select

    pg_select(...)
    {
        foo = select(...);
        // fix errno here;
        return foo;
    }

The fewer places that have to know about this sort of thing, the better
off we will be.

            regards, tom lane

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:

> Ugh.  Is there a way we can insert a wrapper layer without modifying the
> call sites?  I'm thinking of some kind of macro hack, say
> [snip]

Sure. Think we've even done this before (also, prevents developers needing
to remember to use pg_*).

The reason I think it was avoided for select(), in preference for a thread
to invoke the socket op during the signal/APC, was a fear that perhaps the
Windows Sockets internals could get mashed. AFAICS, the discussion Magnus
had with the Microsoft guys (and, from memory, those I've had with Magnus
off-list) suggests this isn't true. If mashing the internals is still a
possibility, then clearly the patch I've submitted might do more harm than
good.

(Magnus, can you confirm?)

If so, I'll submit a patch for select/recv/send over the weekend, which will
also remove the recent fixes for pgstat.

Cheers,
Claudio

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
>> Ugh.  Is there a way we can insert a wrapper layer without
>modifying the
>> call sites?  I'm thinking of some kind of macro hack, say
>> [snip]
>
>Sure. Think we've even done this before (also, prevents
>developers needing to remember to use pg_*).

Yup, it's done to redefine kill() to pqkill(). See include/port/win32.h.



>The reason I think it was avoided for select(), in preference
>for a thread
>to invoke the socket op during the signal/APC, was a fear that
>perhaps the
>Windows Sockets internals could get mashed. AFAICS, the
>discussion Magnus
>had with the Microsoft guys (and, from memory, those I've had
>with Magnus
>off-list) suggests this isn't true. If mashing the internals is still a
>possibility, then clearly the patch I've submitted might do
>more harm than
>good.
>
>(Magnus, can you confirm?)

Yes, it can be done (for select, so I guess for recv and send). It is
recommended that we do not do it that way, though.

There is also a third option, which I'm leaning more and more towards. I
was in that direction in the beginning, but turned away from it thinking
it would be too much work. Also, I was under the impression that we
would rather have some "workarounds checking errors etc" than
reimplement code. I'm now getting the idea that we'd rather have it the
other way around, assuming the code is in the port dir. Fair enough.


The third option is to redefine all these functions into our own, and
implement our own emulation layer. This means our own select(), send(),
recv() (more? I don't think so). And have these call the native winsock
APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These functions are
designed to work in an APC environment.

This is going to be a bit more code that can go wrong, but it carries
the advatages that (1) it's win32 only code, and (2) the code will all
be in the port directory. We implement it along with the standard Unix
semantics, so the main code should not notice. Doing this, we can back
out the changes done so far (the thread fix, the errno fix).

It's the "sledgehammer fix", but I think it's probably time to bite the
bullet and do this one. This way, it's *not* a workaround. Any surprises
we end up with should be from our own code.


>If so, I'll submit a patch for select/recv/send over the
>weekend, which will
>also remove the recent fixes for pgstat.

If you think my suggestion above is not a good one then yes, we can do
this one. But I don't feel all that good about it (and specifically, I
have not had it confirmed about recv() and send() yet - have asked, but
not received a response yet).

If not, I'm going to get started on an implementation using the WSA
functions for you to check over :-)

//Magnus

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
>> Hopeless, or cute, work-around?
>
>It's possibly workable in the limited context of the postmaster, but
>I've got doubts about doing it in libpq where we can't assume we know
>what the surrounding application will do.

No need to touch the frontend parts at all. Our APCs are server side
only, so it's not a problem there.

It should be just as with the kill defrine in win32.h - "#ifndef
FRONTEND".


To the other point - yes, it's a cute workaround. I'm a bit afraid that
we'll end up with another surprise down the road though, since the
behaviour of the select() function is known to be undefined in this
situation (or at leas tthe return value of it is). I'm not sure we can
rely on the pattern we're seeing now. We also don't know for sure about
send() and recv() (though I'm working on that).

I'm almost done with the "our own select()/recv()/send() functions"
attempt. I feel that's a more correct solution (not just a workaround),
*if* I can get the code good enough that we feel comfortable with it. If
not, then a workaround seems the way to go, and then Claudios one looks
good (except we might need to go with a global variable instead - but
the general way, and the localization of it looks good)


//Magnus

Re: APC/socket fix (final?)

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
>>> No need to touch the frontend parts at all. Our APCs are server side
>>> only, so it's not a problem there.
>>
>> Oh really?  What if the surrounding app uses APCs?

> Then I'd say it's that apps problem. And it's clearly nothing new - the
> problem has existed before.

Not if the app itself doesn't use the broken select() code.  In that
scenario, it's *our* problem and not the app's.

> I don't think it's a good idea in general to redefine something as
> fundamental as select, send, recv etc from libpq-fe.h (or files included
> from there).

Certainly not; the redefinition would have to be in files that are not
part of the exported API.  However this is not difficult.  port.h is not
included by libpq-fe.h.

            regards, tom lane

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
>>>> Hopeless, or cute, work-around?
>>>
>>> It's possibly workable in the limited context of the postmaster, but
>>> I've got doubts about doing it in libpq where we can't
>assume we know
>>> what the surrounding application will do.
>
>> No need to touch the frontend parts at all. Our APCs are server side
>> only, so it's not a problem there.
>
>Oh really?  What if the surrounding app uses APCs?

Then I'd say it's that apps problem. And it's clearly nothing new - the
problem has existed before. We'd certainly want to document it, though.

I don't think it's a good idea in general to redefine something as
fundamental as select, send, recv etc from libpq-fe.h (or files included
from there). That will affect any and all socket calls in the program in
question, including those that aren't in use in libpq. If we want to
affect those, we need to find a way to do it without exposing it to the
client apps.


//Magnus

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
>> I don't think it's a good idea in general to redefine something as
>> fundamental as select, send, recv etc from libpq-fe.h (or
>files included
>> from there).
>
>Certainly not; the redefinition would have to be in files that are not
>part of the exported API.  However this is not difficult.
>port.h is not
>included by libpq-fe.h.

I just realised that. I read too many files in parallell and got the
idea that it was included in libpq-fe.h. My mistake, sorry.

Then yes, we probably want to get that thing done for libpq as well.
Though socket operations in APCs arne't exactly common, we know for a
fact that they can give you interesting and incorrect results :-)
Whatever method we choose should probably be applied to the frontend as
well, yes.

//Magnus

Re: APC/socket fix (final?)

From
Tom Lane
Date:
Claudio Natoli <claudio.natoli@memetrics.com> writes:
> Hopeless, or cute, work-around?

It's possibly workable in the limited context of the postmaster, but
I've got doubts about doing it in libpq where we can't assume we know
what the surrounding application will do.

            regards, tom lane

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:
Tom Lane wrote:
> > No need to touch the frontend parts at all. Our APCs are server side
> > only, so it's not a problem there.
>
> Oh really?  What if the surrounding app uses APCs?

To my mind, that's actually a vote *for* the work-around, and in /port (in
order to have it propagate to all front-end code as well).

Cheers,
Claudio

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>

Re: APC/socket fix (final?)

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
>>> Hopeless, or cute, work-around?
>>
>> It's possibly workable in the limited context of the postmaster, but
>> I've got doubts about doing it in libpq where we can't assume we know
>> what the surrounding application will do.

> No need to touch the frontend parts at all. Our APCs are server side
> only, so it's not a problem there.

Oh really?  What if the surrounding app uses APCs?

            regards, tom lane

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:

> Tom Lane wrote:
> > What happens if the socket() call fails?
>
> Then we return in error. Refer to attachment.

Doh. You *were* referring to the attachment. Ok, so it'll need a check to
make sure the socket actually does get created, but this isn't a fatal flaw,
right? [is a work in progress I bashed out about 30 minutes ago].

I'm actually pretty hopeful over this patch, passes regression tests just
fine, lets us remove the win32_pgstat cruft, let's us easily mirror the unix
semantics, ...

Hopeless, or cute, work-around?

Cheers,
Claudio

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:
Tom Lane wrote:
> Claudio Natoli <claudio.natoli@memetrics.com> writes:
> > ... However, it just occured to me that
> > we could wrap select() by augmenting the read_mask with an
> addition socket,
> > that we know will never be touched, and checking this
> socket on a "valid"
> > return. If this socket is still set, we know we got bitten
> by the APC/socket
> > interaction bug, and can set errno accordingly.
>
> What happens if the socket() call fails?

Then we return in error. Refer to attachment.


> Even if it succeeds, I don't know what the semantics are of selecting on
an un-bound socket ... it
> might perhaps show as error state, for instance.

select doesn't need a socket to be bound to select on it, afaik. In any
case, it isn't necessary under win32, which is what we are discussing.

Cheers,
Claudio

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>

Re: APC/socket fix (final?)

From
Tom Lane
Date:
Claudio Natoli <claudio.natoli@memetrics.com> writes:
> ... However, it just occured to me that
> we could wrap select() by augmenting the read_mask with an addition socket,
> that we know will never be touched, and checking this socket on a "valid"
> return. If this socket is still set, we know we got bitten by the APC/socket
> interaction bug, and can set errno accordingly.

What happens if the socket() call fails?  Even if it succeeds, I don't
know what the semantics are of selecting on an un-bound socket ... it
might perhaps show as error state, for instance.

            regards, tom lane

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:
Hi all,

Magnus Hagander wrote:
> The third option is to redefine all these functions into our own, and
> implement our own emulation layer. This means our own select(), send(),
> recv() (more? I don't think so). And have these call the native winsock
> APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These functions are
> designed to work in an APC environment.

I personally think this is the last possible option; I'd much rather stay as
close to *nix as possible. As you said, and I agree whole-heartedly that
there is a lot more code that can go wrong, and it seems unnecessary if we
have a suitable workaround, which I think we might.


> If you think my suggestion above is not a good one then yes, we can do
> this one. But I don't feel all that good about it (and specifically, I
> have not had it confirmed about recv() and send() yet - have asked, but
> not received a response yet).

Reading back through our discussions, I'm not at all concerned about recv()
+ send(), as they are well behaved. When interrupted, they SetLastError()
reasonably, and return < 0.

select(), however, is a different story. On APC interrupt, it returns a
seemingly valid value, doesn't set any of
errno/WSAGetLastError/GetLastError, and leaves the FD_SETs set, which we
concluded was impossible to work-around. However, it just occured to me that
we could wrap select() by augmenting the read_mask with an addition socket,
that we know will never be touched, and checking this socket on a "valid"
return. If this socket is still set, we know we got bitten by the APC/socket
interaction bug, and can set errno accordingly.

I've attached a proposal for /port. Comments?

Cheers,
Claudio


---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>



Attachment

Re: APC/socket fix (final?)

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> I *think* we can do a better job with a not-so-large amount of effort,
> ...
> I'm willing to give it a try. If it turns out to be a lot of work, or
> way too much code, I'll drop the idea.

Sure, give it a shot and see what you get.

            regards, tom lane

Re: APC/socket fix (final?)

From
"Magnus Hagander"
Date:
>> The third option is to redefine all these functions into our own, and
>> implement our own emulation layer. This means our own
>select(), send(),
>> recv() (more? I don't think so). And have these call the
>native winsock
>> APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These
>functions are
>> designed to work in an APC environment.
>
>Pardon me for not having paid close enough attention, but what versions
>of select() and friends are we relying on now?  Is it really reasonable
>to think that we can do a better job without a large amount of effort?


oh yeah, I should add that these "new-style" functions are very similar
to the original functions in syntax and semantics - they're just changed
to interface with the win32 style of blocking/signalling/waiting.


//Magnus

Re: APC/socket fix (final?)

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> The third option is to redefine all these functions into our own, and
> implement our own emulation layer. This means our own select(), send(),
> recv() (more? I don't think so). And have these call the native winsock
> APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These functions are
> designed to work in an APC environment.

Pardon me for not having paid close enough attention, but what versions
of select() and friends are we relying on now?  Is it really reasonable
to think that we can do a better job without a large amount of effort?

            regards, tom lane

Re: APC/socket fix (final?)

From
Claudio Natoli
Date:
> Your patch has been added to the PostgreSQL unapplied patches list at:

Bruce, please cancel the original patch.

The attached patch is for (possible) application to HEAD, following
meritocratic "contest" [socket.c for src/port]

Been iterating with Magnus extensively off-list, and he is well on the way
to an engineered approach. Let's wait and see where that gets to, and then
search for community consensus between the two approaches.

Cheers,
Claudio

PS. Tested extensively, but if anyone wants to patch their own source base
and try it out any feedback would be most welcome.

---
Certain disclaimers and policies apply to all email sent from Memetrics.
For the full text of these disclaimers and policies see
<a
href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
ailpolicy.html</a>



Attachment

Re: APC/socket fix (final?)

From
Bruce Momjian
Date:
This patch has been withdrawn by the author.

---------------------------------------------------------------------------

Claudio Natoli wrote:
>
> For application to HEAD.
>
> This should take care of most, if not all, cases where a backend can be
> interrupted in a blocking socket operation by a signal which itself performs
> a socket operation (which interacts badly with our APC-based signal
> implementation).
>
> ---
> Certain disclaimers and policies apply to all email sent from Memetrics.
> For the full text of these disclaimers and policies see
> <a
> href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em
> ailpolicy.html</a>
>
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073