Thread: patch: garbage error strings in libpq

patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

05 July 2005, 08:26:14

Several libpqxx users have been reporting odd problems with certain error
messages generated by libpq.  One of them was the inclusion of garbage
data.

As it turns out, src/interfaces/libpq/fe-misc.c contains several instances
of this construct:

  printfPQExpBuffer(&conn->ErrorMessage,
    libpq_gettext("error: %s"),
    SOCK_STRERROR(SOCK_ERRNO, buffer, sizeof(buffer)));

This may occur in other source files as well.  On Unix-like systems,
SOCK_ERRNO defines to plain errno--which is likely to be overwritten by
the libpq_gettext().  I'm attaching a patch that fixes these instances by
introducing a named pointer to the SOCK_STRERROR message, initialized
before either of the other function calls.

Another approach would have been to make libpq_gettext() preserve errno.
It's tempting, but I'm not sure it would be valid from a language-lawyer
point of view.  There is no sequence point between the evaluations of
libpq_gettext() and SOCK_STRERROR().  From what I vaguely remember hearing
somewhere in the distant past, that means that theoretically they may be
evaluated not just in any order but even in parallel.  I guess it may
actually happen if both inlining and scheduling are sufficiently
aggressive.  Even if libpq_gettext() is made to restore errno, it will
still have to pollute errno at some points during its execution.


Jeroen

Attachment

fe-misc.patch

Re: patch: garbage error strings in libpq

From

Tom Lane

Date:

05 July 2005, 11:38:56

jtv@xs4all.nl writes:
> Another approach would have been to make libpq_gettext() preserve errno.

That seems like a far easier, cleaner, and more robust fix than this.

Moreover I don't believe that this approach works either, as the result
of strerror() is not guaranteed still usable after another strerror call
(ie, it can use a static buffer repeatedly), so you'd still have the
problem if libpq_gettext invokes strerror.  I suppose that a really
robust solution would involve libpq_gettext saving errno, restoring
errno, and invoking strerror() again ...

            regards, tom lane

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

06 July 2005, 03:18:18

Tom Lane wrote:
> jtv@xs4all.nl writes:
>> Another approach would have been to make libpq_gettext() preserve errno.
>
> That seems like a far easier, cleaner, and more robust fix than this.

Provided that either:

(a) the C standard has added a sequence point between the arguments in a
function call, which AFAIK wasn't there before, or the sequence point was
there all along (and the compiler implements it);
(b) the compiler is sufficiently naive;
(c) you get lucky with instruction scheduling on your particular
architecture.

This is why I called this approach was "tempting," but didn't go for it.
I felt it was better to really fix the instances I found first, then see
what patterns emerge and refactor.

Like maybe a wrapper for printfPQExpBuffer() that takes a PGconn *, an
untranslated format string, and varargs; this in turn can do the
libpq_gettext().  That would cover all uses of printfPQExpBuffer() in
libpq--except for one of the out-of-memory errors where no translation is
done, which may have been unintentional (and this bug is again duplicated
in the code).

> Moreover I don't believe that this approach works either, as the result
> of strerror() is not guaranteed still usable after another strerror call
> (ie, it can use a static buffer repeatedly), so you'd still have the
> problem if libpq_gettext invokes strerror.  I suppose that a really
> robust solution would involve libpq_gettext saving errno, restoring
> errno, and invoking strerror() again ...

Check again.  The calls to strerror() are routed through pqStrerror()
which copies the error message to the buffer, or in the case of GNU
strerror_r(), at least ensures it is in some reusable location.

Jeroen

Re: patch: garbage error strings in libpq

From

Neil Conway

Date:

06 July 2005, 04:56:31

jtv@xs4all.nl wrote:
> (a) the C standard has added a sequence point between the arguments in a
> function call, which AFAIK wasn't there before, or the sequence point was
> there all along (and the compiler implements it);

Per C99 6.5.2.2.10, a sequence point occurs between the evaluation of
the arguments to a function and the call of the function itself.
Therefore a sequence point occurs before the call to libpq_gettext(). So
ISTM having libpq_gettext() preserve errno should work.

-Neil

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

06 July 2005, 06:13:28

Neil Conway wrote:

> Per C99 6.5.2.2.10, a sequence point occurs between the evaluation of
> the arguments to a function and the call of the function itself.
> Therefore a sequence point occurs before the call to libpq_gettext(). So
> ISTM having libpq_gettext() preserve errno should work.

In C99, at least.  But that's not the dialect postgres is written in; even
gcc 4.0 leaves C99 support turned off by default.

Does anyone know what the situation is in C89, or whatever the applicable
standard is?

Jeroen

Re: patch: garbage error strings in libpq

From

Neil Conway

Date:

06 July 2005, 06:39:34

jtv@xs4all.nl wrote:
> Does anyone know what the situation is in C89, or whatever the applicable
> standard is?

[ *looks* ]

The text is the same in both versions:

http://dev.unicals.com/papers/c89-draft.html#3.3.2.2

"The order of evaluation of the function designator, the arguments, and
subexpressions within the arguments is unspecified, but there is a
sequence point before the actual call."

(On reading this more closely, I suppose you could make the argument
that a function call that takes place in the argument list of another
function call is a "subexpression within the [outer function's]
arguments", so the order of evaluation prior to the call of the outer
function would be undefined. But I don't think that's the right reading
of the standard.)

-Neil

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

06 July 2005, 07:35:57

Neil Conway wrote:
> The text is the same in both versions:
>
> http://dev.unicals.com/papers/c89-draft.html#3.3.2.2
>
> "The order of evaluation of the function designator, the arguments, and
> subexpressions within the arguments is unspecified, but there is a
> sequence point before the actual call."
>
> (On reading this more closely, I suppose you could make the argument
> that a function call that takes place in the argument list of another
> function call is a "subexpression within the [outer function's]
> arguments", so the order of evaluation prior to the call of the outer
> function would be undefined. But I don't think that's the right reading
> of the standard.)

That is pretty much what I remember hearing at the time.

To me what this says is only that (the program will behave as if) all
arguments shall be evaluated before the function is called--but in an
otherwise unspecified order.  What we're currently doing has this basic
shape:

int x = 0;
static int a() { x = 1; return x; }
static int b() { printf("b sees x=%d\n", x); return x;}
static int c(int l, int r) { printf("c sees x=%d\n", x); return x; }
int main()
{
    return c(a(),b());
}

Now, the best we can hope for based on what you quote is that we will see
"c sees x=1" but we don't know what we'll see coming out of b().  And the
wording makes it equally clear that we would not change any of this by
doing c(b(),a()) instead of c(a(),b()).

The "best we can hope for" depends on the definition of "unspecified."
This is where it gets really tricky.  I see two different possible
implications depending on that definition:

(optimistic) The program will execute as if the code said either "t1=a();
t2=b(); c(t1,t2)" or "t1=b(); t2=a(); c(t1,t2)" but we don't know which.
I wouldn't bet on this one as a guarantee, although naive compilers will
probably behave like this.

(pessimistic) The executions of a() and b() may be interspersed freely,
although as a practical matter the compiler will respect the sequence
points within each.  But that still means there is no sequence point
between any one given expression in the execution of a() and any other in
the execution of b(), therefore setting a variable in a() and also
touching it in b() leaves behaviour undefined.  The program may react in
any way it likes without violating the standard, including traveling back
in time and refusing to start at all (really!), going off to make tea, or
standing on its head and donning a tutu.

A well-known way to trigger undefined behaviour is "x++=x++;" because
there is no sequence point between the two side effects.  Try it: gcc will
give you a stern warning.  Given that there is no sequence point between
argument expressions, as per the paragraph you quote, the same must go for
"c(x++,x++)".  So then it becomes dubious that there is suddenly a
guarantee for "c(a(),b())"!

Jeroen

Re: patch: garbage error strings in libpq

From

Neil Conway

Date:

06 July 2005, 09:37:22

jtv@xs4all.nl wrote:
> That is pretty much what I remember hearing at the time.

> A well-known way to trigger undefined behaviour is "x++=x++;" because
> there is no sequence point between the two side effects.  Try it: gcc will
> give you a stern warning.  Given that there is no sequence point between
> argument expressions, as per the paragraph you quote, the same must go for
> "c(x++,x++)".  So then it becomes dubious that there is suddenly a
> guarantee for "c(a(),b())"!

Right; my interpretation is that the "sequence point before function
call" rule applies recursively. So in c(a(...), b(...)), there are in
fact three sequence points, which precede the calls of a, b, and c.
Shouldn't that be sufficient to ensure that the evaluation of
libpq_gettext() is not interleaved with the evaluation of the other
arguments to the printf()?

-Neil

Re: patch: garbage error strings in libpq

From

Tom Lane

Date:

06 July 2005, 10:59:11

Neil Conway <neilc@samurai.com> writes:
> Right; my interpretation is that the "sequence point before function
> call" rule applies recursively. So in c(a(...), b(...)), there are in
> fact three sequence points, which precede the calls of a, b, and c.
> Shouldn't that be sufficient to ensure that the evaluation of
> libpq_gettext() is not interleaved with the evaluation of the other
> arguments to the printf()?

I think this is all irrelevant language-lawyering; jtv spotted the true
problem which is that we do not protect errno during the *first* call of
libpq_gettext.

            regards, tom lane

Re: patch: garbage error strings in libpq

From

Neil Conway

Date:

06 July 2005, 11:09:04

Tom Lane wrote:
> I think this is all irrelevant language-lawyering; jtv spotted the true
> problem which is that we do not protect errno during the *first* call of
> libpq_gettext.

I think you're missing the point. Obviously the current code is wrong,
the debate is over the best way to fix it. Jeroen's interpretation of
the spec suggests that merely having libpq_gettext() preserve errno is
not sufficient. I'm not convinced this his interpretation is correct,
but it is a question worth resolving.

-Neil

Re: patch: garbage error strings in libpq

From

Tom Lane

Date:

06 July 2005, 11:22:59

Neil Conway <neilc@samurai.com> writes:
> I think you're missing the point. Obviously the current code is wrong,
> the debate is over the best way to fix it. Jeroen's interpretation of
> the spec suggests that merely having libpq_gettext() preserve errno is
> not sufficient. I'm not convinced this his interpretation is correct,
> but it is a question worth resolving.

(1) The fact that gettext works at all seems to me to be sufficient
empirical evidence that it's a working solution.  (2) Whether there are
sequence points in the function call or not, there most certainly are
sequence points inside each called function.  The spec allows the
functions involved to be called in an unspecified order, but it doesn't
allow the compiler to interleave the execution of several functions.
(3) Interpretation or not, the approach that Jeroen proposes is
unmaintainable; people will not remember to use such a kluge everywhere
they'd need to.  I'd much rather fix it in one place and do whatever we
have to do to keep the compiler from breaking that one place.

            regards, tom lane

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

08 July 2005, 00:34:07

Neil Conway wrote:
> Tom Lane wrote:
>> I think this is all irrelevant language-lawyering; jtv spotted the true
>> problem which is that we do not protect errno during the *first* call of
>> libpq_gettext.
>
> I think you're missing the point. Obviously the current code is wrong,
> the debate is over the best way to fix it. Jeroen's interpretation of
> the spec suggests that merely having libpq_gettext() preserve errno is
> not sufficient. I'm not convinced this his interpretation is correct,
> but it is a question worth resolving.

Agree totally.  If my interpretation is wrong then I'll happily get on
with my life and let everyone else do the same.  I was at that point once
already, but Neil took another close look at the relevant part of the C
standard he dug up and found this potential problem.

I really don't like playing the smart-alec language lawyer here, but I've
been following compiler developments and they are moving in a direction
that makes this relevant.  I do want to be sure that we're shipping
correct code, not just code that practically speaking suppresses the
symptoms of its bugs for a while, on most compilers, for the most popular
CPU architectures.

Moreover, I don't want to go through all of this again when the regression
occurs and we think we've solved it forever and the problem must be
somewhere else.  I've been losing enough sleep over what I thought must be
bugs in libpqxx that I just couldn't put my finger on.

Jeroen

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

08 July 2005, 02:41:17

Tom Lane wrote:
>
> (1) The fact that gettext works at all seems to me to be sufficient
> empirical evidence that it's a working solution.

Tom, you have GOT to be joking.  I agree with some of the things you say
(see below), but here you're just not making sense.  Consider:

1. We're only talking about a very specific property of gettext(), namely
restoration of errno in calls that are not ordered w.r.t. a use of errno.
This is only relevant in a very limited number of all calls to gettext(),
so the mere fact that gettext() works doesn't prove anything relevant.

2. In those limited few cases, and for this specific property, we already
know that something in this general area is *not* working, so you'd have
to find, fix and test that before you could even make this argument.

3. Given that a call to gettext() crosses not just object-file but actual
library boundaries, what do you think the chances are--regardless of
language garantees and compiler aggressiveness--that we'd see the compiler
interleaving instructions from gettext() with a use of errno on the other
side of that boundary?  Does the assumption that gettext() works at all,
therefore, make a reliable indication that there is no problem?

4. In the case of libpq_gettext(), we're not crossing library boundaries.
We're not even crossing object-file boundaries in some cases.  That makes
the chances of instructions being scheduled across a call much
greater--and the question about sequence points much more relevant.

5. From what I understand, the latest versions of gcc are actually
beginning to schedule instructions across function call boundaries.  That
will undoubtedly increase in the future.  There is even a new feature that
can, as far as I can see, lead to instruction scheduling across source
file boundaries.

> (2) Whether there are
> sequence points in the function call or not, there most certainly are
> sequence points inside each called function.  The spec allows the
> functions involved to be called in an unspecified order, but it doesn't
> allow the compiler to interleave the execution of several functions.

That would answer the big question here, but where does it say that?  I
saw Neil's point that the sequence points before function calls apply for
the nested calls as well as the outer one, but there is no ordering
between those "nested-call" sequence points.  It's all easy when you have
a total ordering, but we're in a partial ordering here.

So the missing piece of the puzzle is still that same question.  Obviously
the compiler isn't allowed to interleave function executions (or at least
not let you find out about it) if there is a sequence point _between_
them.  There is in sequential calls, for example, because there is a
sequence point before the function is entered.  But what happens if there
isn't a separating sequence point, like here?

> (3) Interpretation or not, the approach that Jeroen proposes is
> unmaintainable; people will not remember to use such a kluge everywhere
> they'd need to.  I'd much rather fix it in one place and do whatever we
> have to do to keep the compiler from breaking that one place.

That means you haven't seen the approach I suggested the day before
yesterday, where I also explained that I felt it best to fix the incumbent
bugs first before refactoring the result.  You're still talking about my
initial quick-fix patch, which IMHO could have gone in already while we
were arguing over a long-term solution.  That patch was ugly solely in the
sense that the original broken code was ugly; if you want to use words
like "kluge" then please direct them there.  The copy-paste style of
writing loops didn't help either.

As I suggested on Wednesday, maybe we can wrap the combination of
printfPQExpBuffer() and libpq_gettext() into a single function that takes
a PGconn *, a format string, and varargs.  This makes the calls a lot
shorter and simpler, it moots the question of sequence points because
errno would be the first thing to be evaluated, and en passant we'll fix a
few cases where we forgot to call libpq_gettext().  We'd have to have a
similar wrapper for snprintf(), but then I think all cases in libpq are
covered.

Jeroen

Re: patch: garbage error strings in libpq

From

Tom Lane

Date:

08 July 2005, 10:58:53

jtv@xs4all.nl writes:
> That would answer the big question here, but where does it say that?  I
> saw Neil's point that the sequence points before function calls apply for
> the nested calls as well as the outer one, but there is no ordering
> between those "nested-call" sequence points.  It's all easy when you have
> a total ordering, but we're in a partial ordering here.

This is utter nonsense.  If the sequence points within a function do not
follow (in an execution-order sense) the one at the call site, then no C
program on the planet will manage to work.

            regards, tom lane

Re: patch: garbage error strings in libpq

From

Tom Lane

Date:

08 July 2005, 11:15:33

jtv@xs4all.nl writes:
> That would answer the big question here, but where does it say that?

Also, if you really insist on an authoritative statement, try this text
(from Annex D of the C99 draft standard, "Formal model of sequence
points"):

       D.2.2  Function calls                                        |

       [#1] Where an expression involves a function call, that call |
       is ``atomic'' for the purposes of this model.  There must be |
       a  sequence point immediately before (see 6.5.2.2) and after |
       each function  call  (either  because  it  ends  in  a  full |
       expression,  or  because it is required by 7.1.4), and so it |
       can be seen that  the  effects  of  the  call   --  for  the |
       purposes of the surrounding expression  -- can be determined |
       purely by the read and write events involved in it, ignoring |
       their  ordering.   These  events cannot be interspersed with |
       events from outside the call.  Therefore this model treats a |
       function call as being a sequence point.                     |

Cross-function optimization by the compiler is all well and dandy,
but it is on the compiler's head to make sure it doesn't change
the observable semantics of the program.  Not ours.

            regards, tom lane

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

09 July 2005, 10:10:09

Tom Lane wrote:
>> That would answer the big question here, but where does it say that?  I
>> saw Neil's point that the sequence points before function calls apply
>> for
>> the nested calls as well as the outer one, but there is no ordering
>> between those "nested-call" sequence points.  It's all easy when you
>> have
>> a total ordering, but we're in a partial ordering here.
>
> This is utter nonsense.  If the sequence points within a function do not
> follow (in an execution-order sense) the one at the call site, then no C
> program on the planet will manage to work.

That's not what I said though.  In retrospect you could interpret it that
way, but what would be the point in choosing an obviously nonsensical
interpretation?  Yes, the sequence points within a function obviously
follow the function point marking the call itself.  But that's beside the
issue.  It's already implicitly assumed in everything we say here, as I'm
sure you're aware.  When I say 'those "nested-call" sequence points' I
refer to the two sequence points marking the two respective nested calls.

Here's the problem again: we have two calls to respective functions a()
and b(), each call of course marked by a sequence point (call them a0 and
b0) and each function comprising a string of sequence points internally
(say a1a2..an and b1b2..bm) whose internal order must be respected.  No
problem so far, I hope.  a1a2 etc. must be performed in that order and
b1b2 etc. must be performed in that order.  But no ordering is imposed
between a0 and b0!

Now, if there is a requirement somewhere in the standard that says, like
you say, that the execution of two function calls may not be interleaved,
then we're in the clear.  But I didn't find anything to say that.  Absent
that guarantee AFAICS the compiler is allowed to pick any interleaving of
a() and b() that respects their respective strings of sequence points:
a0b0a1b1a2b2 would do, or b0b1a0a1b2a2, and so on.  If that is the case,
then your proposed solution may start to fail mysteriously as compilers
progress.

And now that we're on the subject, shouldn't the default cases of the SSL
error-code switches in pqsecure_read() and pqsecure_write()
(src/interfaces/libpq/fe-secure.c) restore errno as well?

Jeroen

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

09 July 2005, 10:28:09

Tom Lane wrote:

>> That would answer the big question here, but where does it say that?
>
> Also, if you really insist on an authoritative statement, try this text
> (from Annex D of the C99 draft standard, "Formal model of sequence
> points"):

Thank you, that would answer the question.  There is no "also" about it;
it's exactly what I was asking all along.  The conclusive answer for us
would be in the C89 standard of course, where (at least in the draft that
Neil quoted) I haven't been able to find anything like this.  :-(

Jeroen

Re: patch: garbage error strings in libpq

From

Neil Conway

Date:

09 July 2005, 10:34:41

jtv@xs4all.nl wrote:
> Thank you, that would answer the question.  There is no "also" about it;
> it's exactly what I was asking all along.  The conclusive answer for us
> would be in the C89 standard of course, where (at least in the draft that
> Neil quoted) I haven't been able to find anything like this.  :-(

If in the future when compilers actually do begin applying aggressive
enough optimization that this might be a problem in practice, it seems
likely they will also have updated to C99. It seems a little much to
imagine (a) a compiler that does this in the first place (b) a compiler
that varies its interpretation of sequence points in an extremely subtle
way depending on the dialect of C in use. IOW, I think C99 is sufficient.

-Neil

Re: patch: garbage error strings in libpq

From

Stephan Szabo

Date:

09 July 2005, 14:09:32

On Sat, 9 Jul 2005 jtv@xs4all.nl wrote:

> Tom Lane wrote:
>
> >> That would answer the big question here, but where does it say that?
> >
> > Also, if you really insist on an authoritative statement, try this text
> > (from Annex D of the C99 draft standard, "Formal model of sequence
> > points"):
>
> Thank you, that would answer the question.  There is no "also" about it;
> it's exactly what I was asking all along.  The conclusive answer for us
> would be in the C89 standard of course, where (at least in the draft that
> Neil quoted) I haven't been able to find anything like this.  :-(

I believe overlap of functions in the same expression was disallowed by
the response to defect report 087.  The only reference I've been able to
find right now (since the committee seems to have removed the C89 DRs from
their site) is in the response to DR 287 which includes:

"Proposed Committee Response
As noted in the response to DR 087, function calls in the same expression
do not overlap. This has not changed for C99."

Re: patch: garbage error strings in libpq

From

jtv@xs4all.nl

Date:

10 July 2005, 04:38:57

Stephan Szabo wrote:

> I believe overlap of functions in the same expression was disallowed by
> the response to defect report 087.  The only reference I've been able to
> find right now (since the committee seems to have removed the C89 DRs from
> their site) is in the response to DR 287 which includes:
>
> "Proposed Committee Response
> As noted in the response to DR 087, function calls in the same expression
> do not overlap. This has not changed for C99."

Yay!  Good one.  The quote Tom gave was not in the C99 draft I have access
to, and the 1998 C++ standard only contains a footnote implying that the C
standard at the time _did_ allow overlap of function calls whose order is
not specified.  But what you quote here (apart from the "Proposed")
implies that C89 makes the guarantees we need.

Jeroen