Thread: patch: garbage error strings in libpq
Several libpqxx users have been reporting odd problems with certain error messages generated by libpq. One of them was the inclusion of garbage data. As it turns out, src/interfaces/libpq/fe-misc.c contains several instances of this construct: printfPQExpBuffer(&conn->ErrorMessage, libpq_gettext("error: %s"), SOCK_STRERROR(SOCK_ERRNO, buffer, sizeof(buffer))); This may occur in other source files as well. On Unix-like systems, SOCK_ERRNO defines to plain errno--which is likely to be overwritten by the libpq_gettext(). I'm attaching a patch that fixes these instances by introducing a named pointer to the SOCK_STRERROR message, initialized before either of the other function calls. Another approach would have been to make libpq_gettext() preserve errno. It's tempting, but I'm not sure it would be valid from a language-lawyer point of view. There is no sequence point between the evaluations of libpq_gettext() and SOCK_STRERROR(). From what I vaguely remember hearing somewhere in the distant past, that means that theoretically they may be evaluated not just in any order but even in parallel. I guess it may actually happen if both inlining and scheduling are sufficiently aggressive. Even if libpq_gettext() is made to restore errno, it will still have to pollute errno at some points during its execution. Jeroen
Attachment
jtv@xs4all.nl writes: > Another approach would have been to make libpq_gettext() preserve errno. That seems like a far easier, cleaner, and more robust fix than this. Moreover I don't believe that this approach works either, as the result of strerror() is not guaranteed still usable after another strerror call (ie, it can use a static buffer repeatedly), so you'd still have the problem if libpq_gettext invokes strerror. I suppose that a really robust solution would involve libpq_gettext saving errno, restoring errno, and invoking strerror() again ... regards, tom lane
Tom Lane wrote: > jtv@xs4all.nl writes: >> Another approach would have been to make libpq_gettext() preserve errno. > > That seems like a far easier, cleaner, and more robust fix than this. Provided that either: (a) the C standard has added a sequence point between the arguments in a function call, which AFAIK wasn't there before, or the sequence point was there all along (and the compiler implements it); (b) the compiler is sufficiently naive; (c) you get lucky with instruction scheduling on your particular architecture. This is why I called this approach was "tempting," but didn't go for it. I felt it was better to really fix the instances I found first, then see what patterns emerge and refactor. Like maybe a wrapper for printfPQExpBuffer() that takes a PGconn *, an untranslated format string, and varargs; this in turn can do the libpq_gettext(). That would cover all uses of printfPQExpBuffer() in libpq--except for one of the out-of-memory errors where no translation is done, which may have been unintentional (and this bug is again duplicated in the code). > Moreover I don't believe that this approach works either, as the result > of strerror() is not guaranteed still usable after another strerror call > (ie, it can use a static buffer repeatedly), so you'd still have the > problem if libpq_gettext invokes strerror. I suppose that a really > robust solution would involve libpq_gettext saving errno, restoring > errno, and invoking strerror() again ... Check again. The calls to strerror() are routed through pqStrerror() which copies the error message to the buffer, or in the case of GNU strerror_r(), at least ensures it is in some reusable location. Jeroen
jtv@xs4all.nl wrote: > (a) the C standard has added a sequence point between the arguments in a > function call, which AFAIK wasn't there before, or the sequence point was > there all along (and the compiler implements it); Per C99 6.5.2.2.10, a sequence point occurs between the evaluation of the arguments to a function and the call of the function itself. Therefore a sequence point occurs before the call to libpq_gettext(). So ISTM having libpq_gettext() preserve errno should work. -Neil
Neil Conway wrote: > Per C99 6.5.2.2.10, a sequence point occurs between the evaluation of > the arguments to a function and the call of the function itself. > Therefore a sequence point occurs before the call to libpq_gettext(). So > ISTM having libpq_gettext() preserve errno should work. In C99, at least. But that's not the dialect postgres is written in; even gcc 4.0 leaves C99 support turned off by default. Does anyone know what the situation is in C89, or whatever the applicable standard is? Jeroen
jtv@xs4all.nl wrote: > Does anyone know what the situation is in C89, or whatever the applicable > standard is? [ *looks* ] The text is the same in both versions: http://dev.unicals.com/papers/c89-draft.html#3.3.2.2 "The order of evaluation of the function designator, the arguments, and subexpressions within the arguments is unspecified, but there is a sequence point before the actual call." (On reading this more closely, I suppose you could make the argument that a function call that takes place in the argument list of another function call is a "subexpression within the [outer function's] arguments", so the order of evaluation prior to the call of the outer function would be undefined. But I don't think that's the right reading of the standard.) -Neil
Neil Conway wrote: > The text is the same in both versions: > > http://dev.unicals.com/papers/c89-draft.html#3.3.2.2 > > "The order of evaluation of the function designator, the arguments, and > subexpressions within the arguments is unspecified, but there is a > sequence point before the actual call." > > (On reading this more closely, I suppose you could make the argument > that a function call that takes place in the argument list of another > function call is a "subexpression within the [outer function's] > arguments", so the order of evaluation prior to the call of the outer > function would be undefined. But I don't think that's the right reading > of the standard.) That is pretty much what I remember hearing at the time. To me what this says is only that (the program will behave as if) all arguments shall be evaluated before the function is called--but in an otherwise unspecified order. What we're currently doing has this basic shape: int x = 0; static int a() { x = 1; return x; } static int b() { printf("b sees x=%d\n", x); return x;} static int c(int l, int r) { printf("c sees x=%d\n", x); return x; } int main() { return c(a(),b()); } Now, the best we can hope for based on what you quote is that we will see "c sees x=1" but we don't know what we'll see coming out of b(). And the wording makes it equally clear that we would not change any of this by doing c(b(),a()) instead of c(a(),b()). The "best we can hope for" depends on the definition of "unspecified." This is where it gets really tricky. I see two different possible implications depending on that definition: (optimistic) The program will execute as if the code said either "t1=a(); t2=b(); c(t1,t2)" or "t1=b(); t2=a(); c(t1,t2)" but we don't know which. I wouldn't bet on this one as a guarantee, although naive compilers will probably behave like this. (pessimistic) The executions of a() and b() may be interspersed freely, although as a practical matter the compiler will respect the sequence points within each. But that still means there is no sequence point between any one given expression in the execution of a() and any other in the execution of b(), therefore setting a variable in a() and also touching it in b() leaves behaviour undefined. The program may react in any way it likes without violating the standard, including traveling back in time and refusing to start at all (really!), going off to make tea, or standing on its head and donning a tutu. A well-known way to trigger undefined behaviour is "x++=x++;" because there is no sequence point between the two side effects. Try it: gcc will give you a stern warning. Given that there is no sequence point between argument expressions, as per the paragraph you quote, the same must go for "c(x++,x++)". So then it becomes dubious that there is suddenly a guarantee for "c(a(),b())"! Jeroen
jtv@xs4all.nl wrote: > That is pretty much what I remember hearing at the time. > A well-known way to trigger undefined behaviour is "x++=x++;" because > there is no sequence point between the two side effects. Try it: gcc will > give you a stern warning. Given that there is no sequence point between > argument expressions, as per the paragraph you quote, the same must go for > "c(x++,x++)". So then it becomes dubious that there is suddenly a > guarantee for "c(a(),b())"! Right; my interpretation is that the "sequence point before function call" rule applies recursively. So in c(a(...), b(...)), there are in fact three sequence points, which precede the calls of a, b, and c. Shouldn't that be sufficient to ensure that the evaluation of libpq_gettext() is not interleaved with the evaluation of the other arguments to the printf()? -Neil
Neil Conway <neilc@samurai.com> writes: > Right; my interpretation is that the "sequence point before function > call" rule applies recursively. So in c(a(...), b(...)), there are in > fact three sequence points, which precede the calls of a, b, and c. > Shouldn't that be sufficient to ensure that the evaluation of > libpq_gettext() is not interleaved with the evaluation of the other > arguments to the printf()? I think this is all irrelevant language-lawyering; jtv spotted the true problem which is that we do not protect errno during the *first* call of libpq_gettext. regards, tom lane
Tom Lane wrote: > I think this is all irrelevant language-lawyering; jtv spotted the true > problem which is that we do not protect errno during the *first* call of > libpq_gettext. I think you're missing the point. Obviously the current code is wrong, the debate is over the best way to fix it. Jeroen's interpretation of the spec suggests that merely having libpq_gettext() preserve errno is not sufficient. I'm not convinced this his interpretation is correct, but it is a question worth resolving. -Neil
Neil Conway <neilc@samurai.com> writes: > I think you're missing the point. Obviously the current code is wrong, > the debate is over the best way to fix it. Jeroen's interpretation of > the spec suggests that merely having libpq_gettext() preserve errno is > not sufficient. I'm not convinced this his interpretation is correct, > but it is a question worth resolving. (1) The fact that gettext works at all seems to me to be sufficient empirical evidence that it's a working solution. (2) Whether there are sequence points in the function call or not, there most certainly are sequence points inside each called function. The spec allows the functions involved to be called in an unspecified order, but it doesn't allow the compiler to interleave the execution of several functions. (3) Interpretation or not, the approach that Jeroen proposes is unmaintainable; people will not remember to use such a kluge everywhere they'd need to. I'd much rather fix it in one place and do whatever we have to do to keep the compiler from breaking that one place. regards, tom lane
Neil Conway wrote: > Tom Lane wrote: >> I think this is all irrelevant language-lawyering; jtv spotted the true >> problem which is that we do not protect errno during the *first* call of >> libpq_gettext. > > I think you're missing the point. Obviously the current code is wrong, > the debate is over the best way to fix it. Jeroen's interpretation of > the spec suggests that merely having libpq_gettext() preserve errno is > not sufficient. I'm not convinced this his interpretation is correct, > but it is a question worth resolving. Agree totally. If my interpretation is wrong then I'll happily get on with my life and let everyone else do the same. I was at that point once already, but Neil took another close look at the relevant part of the C standard he dug up and found this potential problem. I really don't like playing the smart-alec language lawyer here, but I've been following compiler developments and they are moving in a direction that makes this relevant. I do want to be sure that we're shipping correct code, not just code that practically speaking suppresses the symptoms of its bugs for a while, on most compilers, for the most popular CPU architectures. Moreover, I don't want to go through all of this again when the regression occurs and we think we've solved it forever and the problem must be somewhere else. I've been losing enough sleep over what I thought must be bugs in libpqxx that I just couldn't put my finger on. Jeroen
Tom Lane wrote: > > (1) The fact that gettext works at all seems to me to be sufficient > empirical evidence that it's a working solution. Tom, you have GOT to be joking. I agree with some of the things you say (see below), but here you're just not making sense. Consider: 1. We're only talking about a very specific property of gettext(), namely restoration of errno in calls that are not ordered w.r.t. a use of errno. This is only relevant in a very limited number of all calls to gettext(), so the mere fact that gettext() works doesn't prove anything relevant. 2. In those limited few cases, and for this specific property, we already know that something in this general area is *not* working, so you'd have to find, fix and test that before you could even make this argument. 3. Given that a call to gettext() crosses not just object-file but actual library boundaries, what do you think the chances are--regardless of language garantees and compiler aggressiveness--that we'd see the compiler interleaving instructions from gettext() with a use of errno on the other side of that boundary? Does the assumption that gettext() works at all, therefore, make a reliable indication that there is no problem? 4. In the case of libpq_gettext(), we're not crossing library boundaries. We're not even crossing object-file boundaries in some cases. That makes the chances of instructions being scheduled across a call much greater--and the question about sequence points much more relevant. 5. From what I understand, the latest versions of gcc are actually beginning to schedule instructions across function call boundaries. That will undoubtedly increase in the future. There is even a new feature that can, as far as I can see, lead to instruction scheduling across source file boundaries. > (2) Whether there are > sequence points in the function call or not, there most certainly are > sequence points inside each called function. The spec allows the > functions involved to be called in an unspecified order, but it doesn't > allow the compiler to interleave the execution of several functions. That would answer the big question here, but where does it say that? I saw Neil's point that the sequence points before function calls apply for the nested calls as well as the outer one, but there is no ordering between those "nested-call" sequence points. It's all easy when you have a total ordering, but we're in a partial ordering here. So the missing piece of the puzzle is still that same question. Obviously the compiler isn't allowed to interleave function executions (or at least not let you find out about it) if there is a sequence point _between_ them. There is in sequential calls, for example, because there is a sequence point before the function is entered. But what happens if there isn't a separating sequence point, like here? > (3) Interpretation or not, the approach that Jeroen proposes is > unmaintainable; people will not remember to use such a kluge everywhere > they'd need to. I'd much rather fix it in one place and do whatever we > have to do to keep the compiler from breaking that one place. That means you haven't seen the approach I suggested the day before yesterday, where I also explained that I felt it best to fix the incumbent bugs first before refactoring the result. You're still talking about my initial quick-fix patch, which IMHO could have gone in already while we were arguing over a long-term solution. That patch was ugly solely in the sense that the original broken code was ugly; if you want to use words like "kluge" then please direct them there. The copy-paste style of writing loops didn't help either. As I suggested on Wednesday, maybe we can wrap the combination of printfPQExpBuffer() and libpq_gettext() into a single function that takes a PGconn *, a format string, and varargs. This makes the calls a lot shorter and simpler, it moots the question of sequence points because errno would be the first thing to be evaluated, and en passant we'll fix a few cases where we forgot to call libpq_gettext(). We'd have to have a similar wrapper for snprintf(), but then I think all cases in libpq are covered. Jeroen
jtv@xs4all.nl writes: > That would answer the big question here, but where does it say that? I > saw Neil's point that the sequence points before function calls apply for > the nested calls as well as the outer one, but there is no ordering > between those "nested-call" sequence points. It's all easy when you have > a total ordering, but we're in a partial ordering here. This is utter nonsense. If the sequence points within a function do not follow (in an execution-order sense) the one at the call site, then no C program on the planet will manage to work. regards, tom lane
jtv@xs4all.nl writes: > That would answer the big question here, but where does it say that? Also, if you really insist on an authoritative statement, try this text (from Annex D of the C99 draft standard, "Formal model of sequence points"): D.2.2 Function calls | [#1] Where an expression involves a function call, that call | is ``atomic'' for the purposes of this model. There must be | a sequence point immediately before (see 6.5.2.2) and after | each function call (either because it ends in a full | expression, or because it is required by 7.1.4), and so it | can be seen that the effects of the call -- for the | purposes of the surrounding expression -- can be determined | purely by the read and write events involved in it, ignoring | their ordering. These events cannot be interspersed with | events from outside the call. Therefore this model treats a | function call as being a sequence point. | Cross-function optimization by the compiler is all well and dandy, but it is on the compiler's head to make sure it doesn't change the observable semantics of the program. Not ours. regards, tom lane
Tom Lane wrote: >> That would answer the big question here, but where does it say that? I >> saw Neil's point that the sequence points before function calls apply >> for >> the nested calls as well as the outer one, but there is no ordering >> between those "nested-call" sequence points. It's all easy when you >> have >> a total ordering, but we're in a partial ordering here. > > This is utter nonsense. If the sequence points within a function do not > follow (in an execution-order sense) the one at the call site, then no C > program on the planet will manage to work. That's not what I said though. In retrospect you could interpret it that way, but what would be the point in choosing an obviously nonsensical interpretation? Yes, the sequence points within a function obviously follow the function point marking the call itself. But that's beside the issue. It's already implicitly assumed in everything we say here, as I'm sure you're aware. When I say 'those "nested-call" sequence points' I refer to the two sequence points marking the two respective nested calls. Here's the problem again: we have two calls to respective functions a() and b(), each call of course marked by a sequence point (call them a0 and b0) and each function comprising a string of sequence points internally (say a1a2..an and b1b2..bm) whose internal order must be respected. No problem so far, I hope. a1a2 etc. must be performed in that order and b1b2 etc. must be performed in that order. But no ordering is imposed between a0 and b0! Now, if there is a requirement somewhere in the standard that says, like you say, that the execution of two function calls may not be interleaved, then we're in the clear. But I didn't find anything to say that. Absent that guarantee AFAICS the compiler is allowed to pick any interleaving of a() and b() that respects their respective strings of sequence points: a0b0a1b1a2b2 would do, or b0b1a0a1b2a2, and so on. If that is the case, then your proposed solution may start to fail mysteriously as compilers progress. And now that we're on the subject, shouldn't the default cases of the SSL error-code switches in pqsecure_read() and pqsecure_write() (src/interfaces/libpq/fe-secure.c) restore errno as well? Jeroen
Tom Lane wrote: >> That would answer the big question here, but where does it say that? > > Also, if you really insist on an authoritative statement, try this text > (from Annex D of the C99 draft standard, "Formal model of sequence > points"): Thank you, that would answer the question. There is no "also" about it; it's exactly what I was asking all along. The conclusive answer for us would be in the C89 standard of course, where (at least in the draft that Neil quoted) I haven't been able to find anything like this. :-( Jeroen
jtv@xs4all.nl wrote: > Thank you, that would answer the question. There is no "also" about it; > it's exactly what I was asking all along. The conclusive answer for us > would be in the C89 standard of course, where (at least in the draft that > Neil quoted) I haven't been able to find anything like this. :-( If in the future when compilers actually do begin applying aggressive enough optimization that this might be a problem in practice, it seems likely they will also have updated to C99. It seems a little much to imagine (a) a compiler that does this in the first place (b) a compiler that varies its interpretation of sequence points in an extremely subtle way depending on the dialect of C in use. IOW, I think C99 is sufficient. -Neil
On Sat, 9 Jul 2005 jtv@xs4all.nl wrote: > Tom Lane wrote: > > >> That would answer the big question here, but where does it say that? > > > > Also, if you really insist on an authoritative statement, try this text > > (from Annex D of the C99 draft standard, "Formal model of sequence > > points"): > > Thank you, that would answer the question. There is no "also" about it; > it's exactly what I was asking all along. The conclusive answer for us > would be in the C89 standard of course, where (at least in the draft that > Neil quoted) I haven't been able to find anything like this. :-( I believe overlap of functions in the same expression was disallowed by the response to defect report 087. The only reference I've been able to find right now (since the committee seems to have removed the C89 DRs from their site) is in the response to DR 287 which includes: "Proposed Committee Response As noted in the response to DR 087, function calls in the same expression do not overlap. This has not changed for C99."
Stephan Szabo wrote: > I believe overlap of functions in the same expression was disallowed by > the response to defect report 087. The only reference I've been able to > find right now (since the committee seems to have removed the C89 DRs from > their site) is in the response to DR 287 which includes: > > "Proposed Committee Response > As noted in the response to DR 087, function calls in the same expression > do not overlap. This has not changed for C99." Yay! Good one. The quote Tom gave was not in the C99 draft I have access to, and the 1998 C++ standard only contains a footnote implying that the C standard at the time _did_ allow overlap of function calls whose order is not specified. But what you quote here (apart from the "Proposed") implies that C89 makes the guarantees we need. Jeroen