Thread: APC/socket fix (final?)
For application to HEAD. This should take care of most, if not all, cases where a backend can be interrupted in a blocking socket operation by a signal which itself performs a socket operation (which interacts badly with our APC-based signal implementation). --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Attachment
Thanks. I was getting to that, but hadn't started :-) Per our discussion off-list, I agree with this method, and the patch looks fine to me. //Magnus > -----Original Message----- > From: Claudio Natoli [mailto:claudio.natoli@memetrics.com] > Sent: Thursday, March 25, 2004 2:07 AM > To: 'pgsql-patches@postgresql.org' > Subject: [PATCHES] APC/socket fix (final?) > > > > For application to HEAD. > > This should take care of most, if not all, cases where a > backend can be interrupted in a blocking socket operation by > a signal which itself performs a socket operation (which > interacts badly with our APC-based signal implementation). > > --- > Certain disclaimers and policies apply to all email sent from > Memetrics. For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.me > metrics.com/em > ailpolicy.html</a> > > >
Claudio Natoli <claudio.natoli@memetrics.com> writes: > + #ifdef WIN32 > + /* Interrupted by socket/APC interaction? */ > + if (n < 0 && GetLastError() == ERROR_IO_PENDING) > + errno = EINTR; > + #endif This seems a bit schizophrenic; if you can assign to errno, why can't you read from it? Would look more consistent if the code looked like if (n < 0 && errno == ERROR_IO_PENDING) errno = EINTR; regards, tom lane
>Claudio Natoli <claudio.natoli@memetrics.com> writes: >> + #ifdef WIN32 >> + /* Interrupted by socket/APC interaction? */ >> + if (n < 0 && GetLastError() == ERROR_IO_PENDING) >> + errno = EINTR; >> + #endif > >This seems a bit schizophrenic; if you can assign to errno, >why can't you >read from it? Would look more consistent if the code looked like > > if (n < 0 && errno == ERROR_IO_PENDING) > errno = EINTR; > The problem is that winsock does not *set* the errno variable in the case when it's interrupted by an APC. errno is left at 0. GetLastError() is the win32 replacement for errno. In most cases errno is set correctly when you use the "unix api functions", but not in this case (which is arguably a bug, but it's there nevertheless). If you use "native win32 functions", you get GetLastError() set only. //Magnus
> Would look more consistent if the code looked like > > if (n < 0 && errno == ERROR_IO_PENDING) > errno = EINTR; It would be more consistent, but unfortunately GetLastError() != errno. --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: > It would be more consistent, but unfortunately GetLastError() != errno. Yeah, I saw Magnus' explanation. So essentially this is a workaround for a bug in Windows' select() emulation. Rather than hoping that we'll remember to decorate every select() call with this workaround, would it make sense to use a wrapper function? I'm loath to invent pg_select() but it might be cleaner than this. regards, tom lane
> Rather than hoping that we'll remember to decorate every select() call > with this workaround, would it make sense to use a wrapper function? > I'm loath to invent pg_select() but it might be cleaner than this. We'd also need pg_recv() and pg_send(). Chances are it can happen with every blocking socket call :-( Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: >> I'm loath to invent pg_select() but it might be cleaner than this. > We'd also need pg_recv() and pg_send(). Chances are it can happen with every > blocking socket call :-( Ugh. Is there a way we can insert a wrapper layer without modifying the call sites? I'm thinking of some kind of macro hack, say #ifdef WIN32 #define select(...) pg_select(...) #endif and then provide a port module that goes roughly like #undef select pg_select(...) { foo = select(...); // fix errno here; return foo; } The fewer places that have to know about this sort of thing, the better off we will be. regards, tom lane
> Ugh. Is there a way we can insert a wrapper layer without modifying the > call sites? I'm thinking of some kind of macro hack, say > [snip] Sure. Think we've even done this before (also, prevents developers needing to remember to use pg_*). The reason I think it was avoided for select(), in preference for a thread to invoke the socket op during the signal/APC, was a fear that perhaps the Windows Sockets internals could get mashed. AFAICS, the discussion Magnus had with the Microsoft guys (and, from memory, those I've had with Magnus off-list) suggests this isn't true. If mashing the internals is still a possibility, then clearly the patch I've submitted might do more harm than good. (Magnus, can you confirm?) If so, I'll submit a patch for select/recv/send over the weekend, which will also remove the recent fixes for pgstat. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
>> Ugh. Is there a way we can insert a wrapper layer without >modifying the >> call sites? I'm thinking of some kind of macro hack, say >> [snip] > >Sure. Think we've even done this before (also, prevents >developers needing to remember to use pg_*). Yup, it's done to redefine kill() to pqkill(). See include/port/win32.h. >The reason I think it was avoided for select(), in preference >for a thread >to invoke the socket op during the signal/APC, was a fear that >perhaps the >Windows Sockets internals could get mashed. AFAICS, the >discussion Magnus >had with the Microsoft guys (and, from memory, those I've had >with Magnus >off-list) suggests this isn't true. If mashing the internals is still a >possibility, then clearly the patch I've submitted might do >more harm than >good. > >(Magnus, can you confirm?) Yes, it can be done (for select, so I guess for recv and send). It is recommended that we do not do it that way, though. There is also a third option, which I'm leaning more and more towards. I was in that direction in the beginning, but turned away from it thinking it would be too much work. Also, I was under the impression that we would rather have some "workarounds checking errors etc" than reimplement code. I'm now getting the idea that we'd rather have it the other way around, assuming the code is in the port dir. Fair enough. The third option is to redefine all these functions into our own, and implement our own emulation layer. This means our own select(), send(), recv() (more? I don't think so). And have these call the native winsock APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These functions are designed to work in an APC environment. This is going to be a bit more code that can go wrong, but it carries the advatages that (1) it's win32 only code, and (2) the code will all be in the port directory. We implement it along with the standard Unix semantics, so the main code should not notice. Doing this, we can back out the changes done so far (the thread fix, the errno fix). It's the "sledgehammer fix", but I think it's probably time to bite the bullet and do this one. This way, it's *not* a workaround. Any surprises we end up with should be from our own code. >If so, I'll submit a patch for select/recv/send over the >weekend, which will >also remove the recent fixes for pgstat. If you think my suggestion above is not a good one then yes, we can do this one. But I don't feel all that good about it (and specifically, I have not had it confirmed about recv() and send() yet - have asked, but not received a response yet). If not, I'm going to get started on an implementation using the WSA functions for you to check over :-) //Magnus
>> Hopeless, or cute, work-around? > >It's possibly workable in the limited context of the postmaster, but >I've got doubts about doing it in libpq where we can't assume we know >what the surrounding application will do. No need to touch the frontend parts at all. Our APCs are server side only, so it's not a problem there. It should be just as with the kill defrine in win32.h - "#ifndef FRONTEND". To the other point - yes, it's a cute workaround. I'm a bit afraid that we'll end up with another surprise down the road though, since the behaviour of the select() function is known to be undefined in this situation (or at leas tthe return value of it is). I'm not sure we can rely on the pattern we're seeing now. We also don't know for sure about send() and recv() (though I'm working on that). I'm almost done with the "our own select()/recv()/send() functions" attempt. I feel that's a more correct solution (not just a workaround), *if* I can get the code good enough that we feel comfortable with it. If not, then a workaround seems the way to go, and then Claudios one looks good (except we might need to go with a global variable instead - but the general way, and the localization of it looks good) //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: >>> No need to touch the frontend parts at all. Our APCs are server side >>> only, so it's not a problem there. >> >> Oh really? What if the surrounding app uses APCs? > Then I'd say it's that apps problem. And it's clearly nothing new - the > problem has existed before. Not if the app itself doesn't use the broken select() code. In that scenario, it's *our* problem and not the app's. > I don't think it's a good idea in general to redefine something as > fundamental as select, send, recv etc from libpq-fe.h (or files included > from there). Certainly not; the redefinition would have to be in files that are not part of the exported API. However this is not difficult. port.h is not included by libpq-fe.h. regards, tom lane
>>>> Hopeless, or cute, work-around? >>> >>> It's possibly workable in the limited context of the postmaster, but >>> I've got doubts about doing it in libpq where we can't >assume we know >>> what the surrounding application will do. > >> No need to touch the frontend parts at all. Our APCs are server side >> only, so it's not a problem there. > >Oh really? What if the surrounding app uses APCs? Then I'd say it's that apps problem. And it's clearly nothing new - the problem has existed before. We'd certainly want to document it, though. I don't think it's a good idea in general to redefine something as fundamental as select, send, recv etc from libpq-fe.h (or files included from there). That will affect any and all socket calls in the program in question, including those that aren't in use in libpq. If we want to affect those, we need to find a way to do it without exposing it to the client apps. //Magnus
>> I don't think it's a good idea in general to redefine something as >> fundamental as select, send, recv etc from libpq-fe.h (or >files included >> from there). > >Certainly not; the redefinition would have to be in files that are not >part of the exported API. However this is not difficult. >port.h is not >included by libpq-fe.h. I just realised that. I read too many files in parallell and got the idea that it was included in libpq-fe.h. My mistake, sorry. Then yes, we probably want to get that thing done for libpq as well. Though socket operations in APCs arne't exactly common, we know for a fact that they can give you interesting and incorrect results :-) Whatever method we choose should probably be applied to the frontend as well, yes. //Magnus
Claudio Natoli <claudio.natoli@memetrics.com> writes: > Hopeless, or cute, work-around? It's possibly workable in the limited context of the postmaster, but I've got doubts about doing it in libpq where we can't assume we know what the surrounding application will do. regards, tom lane
Tom Lane wrote: > > No need to touch the frontend parts at all. Our APCs are server side > > only, so it's not a problem there. > > Oh really? What if the surrounding app uses APCs? To my mind, that's actually a vote *for* the work-around, and in /port (in order to have it propagate to all front-end code as well). Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
"Magnus Hagander" <mha@sollentuna.net> writes: >>> Hopeless, or cute, work-around? >> >> It's possibly workable in the limited context of the postmaster, but >> I've got doubts about doing it in libpq where we can't assume we know >> what the surrounding application will do. > No need to touch the frontend parts at all. Our APCs are server side > only, so it's not a problem there. Oh really? What if the surrounding app uses APCs? regards, tom lane
> Tom Lane wrote: > > What happens if the socket() call fails? > > Then we return in error. Refer to attachment. Doh. You *were* referring to the attachment. Ok, so it'll need a check to make sure the socket actually does get created, but this isn't a fatal flaw, right? [is a work in progress I bashed out about 30 minutes ago]. I'm actually pretty hopeful over this patch, passes regression tests just fine, lets us remove the win32_pgstat cruft, let's us easily mirror the unix semantics, ... Hopeless, or cute, work-around? Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Tom Lane wrote: > Claudio Natoli <claudio.natoli@memetrics.com> writes: > > ... However, it just occured to me that > > we could wrap select() by augmenting the read_mask with an > addition socket, > > that we know will never be touched, and checking this > socket on a "valid" > > return. If this socket is still set, we know we got bitten > by the APC/socket > > interaction bug, and can set errno accordingly. > > What happens if the socket() call fails? Then we return in error. Refer to attachment. > Even if it succeeds, I don't know what the semantics are of selecting on an un-bound socket ... it > might perhaps show as error state, for instance. select doesn't need a socket to be bound to select on it, afaik. In any case, it isn't necessary under win32, which is what we are discussing. Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Claudio Natoli <claudio.natoli@memetrics.com> writes: > ... However, it just occured to me that > we could wrap select() by augmenting the read_mask with an addition socket, > that we know will never be touched, and checking this socket on a "valid" > return. If this socket is still set, we know we got bitten by the APC/socket > interaction bug, and can set errno accordingly. What happens if the socket() call fails? Even if it succeeds, I don't know what the semantics are of selecting on an un-bound socket ... it might perhaps show as error state, for instance. regards, tom lane
Hi all, Magnus Hagander wrote: > The third option is to redefine all these functions into our own, and > implement our own emulation layer. This means our own select(), send(), > recv() (more? I don't think so). And have these call the native winsock > APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These functions are > designed to work in an APC environment. I personally think this is the last possible option; I'd much rather stay as close to *nix as possible. As you said, and I agree whole-heartedly that there is a lot more code that can go wrong, and it seems unnecessary if we have a suitable workaround, which I think we might. > If you think my suggestion above is not a good one then yes, we can do > this one. But I don't feel all that good about it (and specifically, I > have not had it confirmed about recv() and send() yet - have asked, but > not received a response yet). Reading back through our discussions, I'm not at all concerned about recv() + send(), as they are well behaved. When interrupted, they SetLastError() reasonably, and return < 0. select(), however, is a different story. On APC interrupt, it returns a seemingly valid value, doesn't set any of errno/WSAGetLastError/GetLastError, and leaves the FD_SETs set, which we concluded was impossible to work-around. However, it just occured to me that we could wrap select() by augmenting the read_mask with an addition socket, that we know will never be touched, and checking this socket on a "valid" return. If this socket is still set, we know we got bitten by the APC/socket interaction bug, and can set errno accordingly. I've attached a proposal for /port. Comments? Cheers, Claudio --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Attachment
"Magnus Hagander" <mha@sollentuna.net> writes: > I *think* we can do a better job with a not-so-large amount of effort, > ... > I'm willing to give it a try. If it turns out to be a lot of work, or > way too much code, I'll drop the idea. Sure, give it a shot and see what you get. regards, tom lane
>> The third option is to redefine all these functions into our own, and >> implement our own emulation layer. This means our own >select(), send(), >> recv() (more? I don't think so). And have these call the >native winsock >> APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These >functions are >> designed to work in an APC environment. > >Pardon me for not having paid close enough attention, but what versions >of select() and friends are we relying on now? Is it really reasonable >to think that we can do a better job without a large amount of effort? oh yeah, I should add that these "new-style" functions are very similar to the original functions in syntax and semantics - they're just changed to interface with the win32 style of blocking/signalling/waiting. //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: > The third option is to redefine all these functions into our own, and > implement our own emulation layer. This means our own select(), send(), > recv() (more? I don't think so). And have these call the native winsock > APIs (WSAEventSelect(), WSASend(), WSARecv() etc). These functions are > designed to work in an APC environment. Pardon me for not having paid close enough attention, but what versions of select() and friends are we relying on now? Is it really reasonable to think that we can do a better job without a large amount of effort? regards, tom lane
> Your patch has been added to the PostgreSQL unapplied patches list at: Bruce, please cancel the original patch. The attached patch is for (possible) application to HEAD, following meritocratic "contest" [socket.c for src/port] Been iterating with Magnus extensively off-list, and he is well on the way to an engineered approach. Let's wait and see where that gets to, and then search for community consensus between the two approaches. Cheers, Claudio PS. Tested extensively, but if anyone wants to patch their own source base and try it out any feedback would be most welcome. --- Certain disclaimers and policies apply to all email sent from Memetrics. For the full text of these disclaimers and policies see <a href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em ailpolicy.html</a>
Attachment
This patch has been withdrawn by the author. --------------------------------------------------------------------------- Claudio Natoli wrote: > > For application to HEAD. > > This should take care of most, if not all, cases where a backend can be > interrupted in a blocking socket operation by a signal which itself performs > a socket operation (which interacts badly with our APC-based signal > implementation). > > --- > Certain disclaimers and policies apply to all email sent from Memetrics. > For the full text of these disclaimers and policies see > <a > href="http://www.memetrics.com/emailpolicy.html">http://www.memetrics.com/em > ailpolicy.html</a> > > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073