Thread: Reasons not to like asprintf

Reasons not to like asprintf

From

Tom Lane

Date:

22 October 2013, 16:35:45

I wrote:
> I have a lot of other gripes about this whole patch, but they can
> wait till tomorrow.

The other big thing I don't like about this patch is that adopting
asprintf() as a model was not any better thought-out than the
portability considerations were.  It's a bad choice because:

1. asprintf() is not in C99 nor any POSIX standard.

2. The two known implementations, glibc and *BSD, behave differently,
as admitted in the Linux man page.  Thus, there is nothing resembling
a standard here.

3. Neither implementation has given any thought to error reporting;
it's not possible to tell whether a failure was due to out-of-memory
or a vsnprintf failure such as a format-string error.  (pg_asprintf
assumes the former, which probably has something to do with the bogus
out-of-memory failures we're seeing on buildfarm member "anole", though
I'm not sure about the details.)

4. The use of a char** output parameter is not very friendly, because
(at least on older versions of gcc) it disables uninitialized-variable
detection.  That is, if you have code like
char **str;
if (...)   asprintf(&str, ...);else   asprintf(&str, ...);... use str for something ...

many compilers will not be able to check whether you actually assigned
to str in both code paths.

5. The reason for the char** parameter is that the function return
value is needed for error/buffer overrun handling --- but no caller
of pg_asprintf looks at the return value at all.

6. Thus, I much prefer the API design that was used for psprintf,
that is return the allocated buffer pointer as the function result.
It seems like a bad idea to me that we adopted two different APIs
for frontend and backend code; if we're going to hardwire
exit()-on-failure into pg_asprintf there is no benefit whatsoever
to reserving the function return value for failure indications.

In short, I want to change pg_asprintf to return the malloc'd buffer
as its function result.  This probably means we should change the
function name too, since the allusion to asprintf is completely useless
if we do that.  I'm thinking pg_psprintf for the frontend version of
psprintf, but maybe somebody has a better suggestion?
        regards, tom lane

Re: Reasons not to like asprintf

From

Stephen Frost

Date:

22 October 2013, 17:53:10

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> In short, I want to change pg_asprintf to return the malloc'd buffer
> as its function result.  This probably means we should change the
> function name too, since the allusion to asprintf is completely useless
> if we do that.  I'm thinking pg_psprintf for the frontend version of
> psprintf, but maybe somebody has a better suggestion?

Sounds reasonable, and I haven't got a better name, but I'm trying to
figure out why psprintf hasn't got the same issues which you mention in
your other thread (eg: depending on vsnprintf to return the "would-be"
result size).  I'm also a bit nervous that we only check vsnprintf()
success in Assert-enabled builds in psprintf, though I suppose that's
all we're doing in appendStringInfo too.
Thanks,
    Stephen

Re: Reasons not to like asprintf

From

Tom Lane

Date:

22 October 2013, 18:20:36

Stephen Frost <sfrost@snowman.net> writes:
> Sounds reasonable, and I haven't got a better name, but I'm trying to
> figure out why psprintf hasn't got the same issues which you mention in
> your other thread (eg: depending on vsnprintf to return the "would-be"
> result size).

It does, though not directly in psprintf but rather in the underlying
vasprintf implementation.  (Although psprintf is exposed to the problem
that it can't tell out-of-memory from format errors.)

What I'm on about in this thread is the API used by all the call sites,
not the underlying implementation issues.  As long as we're okay with
"exit or elog on any error", I think we should just have the widely-used
functions returning a buffer pointer.  Underneath that, we need some work
so that we can print more-specific error messages, but that will not be
of concern to the callers.

It's possibly debatable whether exit-on-error is okay or not.  I think
it is though, because none of our frontend code is prepared to do anything
except go belly-up on out-of-memory, while in the backend case it's only
elog(ERROR) anyway.  The other possible class of failures is format string
or encoding errors, but those seem to reduce to the same cases: I can't
envision that frontend code would have any useful recovery strategies.
I'd like the printed error message to distinguish these cases, but I don't
think the callers need to care.

> I'm also a bit nervous that we only check vsnprintf()
> success in Assert-enabled builds in psprintf, though I suppose that's
> all we're doing in appendStringInfo too.

Actually, appendStringInfo treats result < 0 as a buffer overrun report;
if the failure is persistent because it's really a format/encoding
problem, it'll loop till it runs out of memory, and then report the error
as that.  The trouble here is that in pre-C99 implementations of
vsnprintf, return code < 0 might mean either buffer overrun or a
format/encoding problem.  We can't do too much about this unless we are
prepared to move our goalposts about portability assumptions.  (Hmm ...
current POSIX requires *printf to set errno on error, so maybe we could
look at errno?)
        regards, tom lane

Re: Reasons not to like asprintf

From

Stephen Frost

Date:

22 October 2013, 18:45:09

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Sounds reasonable, and I haven't got a better name, but I'm trying to
> > figure out why psprintf hasn't got the same issues which you mention in
> > your other thread (eg: depending on vsnprintf to return the "would-be"
> > result size).
>
> It does, though not directly in psprintf but rather in the underlying
> vasprintf implementation.  (Although psprintf is exposed to the problem
> that it can't tell out-of-memory from format errors.)

Right, pvsprintf().  Sorry for any confusion.

> What I'm on about in this thread is the API used by all the call sites,
> not the underlying implementation issues.  As long as we're okay with
> "exit or elog on any error", I think we should just have the widely-used
> functions returning a buffer pointer.  Underneath that, we need some work
> so that we can print more-specific error messages, but that will not be
> of concern to the callers.

I agree that exit/Assert-or-elog is the right API here.  I wonder if
it'd be reasonable or worthwhile to try and actually make that happen-
that is to say, we really do only have one implementation for both
front-end and back-end here, but on the front-end we exit, while on the
back-end we elog(ERROR).  I'm guessing that's a pipe-dream, but figured
I'd mention it anyway, in case people have crafty ideas about how that
might be made to work (and if others think it's even a good idea).

> The other possible class of failures is format string
> or encoding errors, but those seem to reduce to the same cases: I can't
> envision that frontend code would have any useful recovery strategies.
> I'd like the printed error message to distinguish these cases, but I don't
> think the callers need to care.

Agreed.

> > I'm also a bit nervous that we only check vsnprintf()
> > success in Assert-enabled builds in psprintf, though I suppose that's
> > all we're doing in appendStringInfo too.
>
> Actually, appendStringInfo treats result < 0 as a buffer overrun report;

I had been looking at the Assert() that the last character is always
'\0' in allocStringInfoVA.  Probably a stretch to say that has to be
upgraded to an elog(), but I'd certainly be much happier with a runtime
"did this error?" check of some kind than not; hence my support for your
errno notion below..

> if the failure is persistent because it's really a format/encoding
> problem, it'll loop till it runs out of memory, and then report the error
> as that.

Ah, yes, I see that in enlargeStringInfo and as long as we keep
MaxAllocSize reasonable that isn't terrible.

> (Hmm ...
> current POSIX requires *printf to set errno on error, so maybe we could
> look at errno?)

Sounds like a good idea to me..  Being explicit about checking for error
results, rather than hoping-against-hope that the only reason we'd ever
get a value less than the total length is when we need to provide more
memory, would be a lot better.  Perhaps it'd be worthwhile to run a test
against the build farm and see what kind of errno's are being set in
those cases where we're now getting the 'out of memory' failures you
mentioned upthread?
Thanks,
    Stephen

Re: Reasons not to like asprintf

From

Tom Lane

Date:

22 October 2013, 19:17:19

Stephen Frost <sfrost@snowman.net> writes:
> I agree that exit/Assert-or-elog is the right API here.  I wonder if
> it'd be reasonable or worthwhile to try and actually make that happen-
> that is to say, we really do only have one implementation for both
> front-end and back-end here, but on the front-end we exit, while on the
> back-end we elog(ERROR).  I'm guessing that's a pipe-dream,

It's not really --- we could make it happen with a single source-file in
libpgcommon, with a few #ifdef FRONTENDs, similarly to exec.c for
instance.  I'm not entirely sure that it's worth the trouble, since once
you take away the error and memory management there's not much left of
these functions; so the #ifdefs might end up being most of the code.
I'm going to go code it shortly, and we'll find out.

> I had been looking at the Assert() that the last character is always
> '\0' in allocStringInfoVA.  Probably a stretch to say that has to be
> upgraded to an elog(), but I'd certainly be much happier with a runtime
> "did this error?" check of some kind than not;

IIRC, that Assert is meant to catch an ancient buggy version of vsnprintf,
so it can't really be replaced by an errno check --- the whole point is
that that vsnprintf failed to recognize it was doing something wrong.
We could upgrade it to an elog; but we had that debate years ago when
the code went in, and I don't think the passage of time has made it more
important rather than less so to have an always-on check for that case.

>> (Hmm ...
>> current POSIX requires *printf to set errno on error, so maybe we could
>> look at errno?)

> Sounds like a good idea to me..  Being explicit about checking for error
> results, rather than hoping-against-hope that the only reason we'd ever
> get a value less than the total length is when we need to provide more
> memory, would be a lot better.  Perhaps it'd be worthwhile to run a test
> against the build farm and see what kind of errno's are being set in
> those cases where we're now getting the 'out of memory' failures you
> mentioned upthread?

Yeah, I'm hoping that reporting errno will give us some more insight,
if anole's problem is still there after we replace the code.
        regards, tom lane

Re: Reasons not to like asprintf

From

Tom Lane

Date:

22 October 2013, 19:40:34

... BTW, another reason to choose identical APIs for frontend and backend
versions of these functions is that it greatly eases use of them in shared
frontend/backend code.  As I notice somebody has *already done* in
common/relpath.c.   I'm not exactly sure how those psprintf calls are
working at all in frontend builds.  Maybe they aren't quite, and that has
something to do with the failures on anole?

In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
I'm now thinking we should use exactly the same names for the frontend and
backend versions, ie psprintf() and pvsprintf().  The main reason for
considering a pg_ prefix for the frontend versions was to avoid cluttering
application namespace; but it's already the case that we don't expect
libpgcommon to be namespace clean.
        regards, tom lane

Re: Reasons not to like asprintf

From

Stephen Frost

Date:

22 October 2013, 19:47:31

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> ... BTW, another reason to choose identical APIs for frontend and backend
> versions of these functions is that it greatly eases use of them in shared
> frontend/backend code.  As I notice somebody has *already done* in
> common/relpath.c.   I'm not exactly sure how those psprintf calls are
> working at all in frontend builds.  Maybe they aren't quite, and that has
> something to do with the failures on anole?

Seems plausible.

> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
> I'm now thinking we should use exactly the same names for the frontend and
> backend versions, ie psprintf() and pvsprintf().  The main reason for
> considering a pg_ prefix for the frontend versions was to avoid cluttering
> application namespace; but it's already the case that we don't expect
> libpgcommon to be namespace clean.

To be honest, I had been assuming we'd use the same names.  As for which
direction to go, I'd personally prefer psprintf but that's just my lazy
developer fingers talking.
Thanks,
    Stephen

Re: Reasons not to like asprintf

From

Alvaro Herrera

Date:

22 October 2013, 19:50:43

Tom Lane wrote:
> ... BTW, another reason to choose identical APIs for frontend and backend
> versions of these functions is that it greatly eases use of them in shared
> frontend/backend code.  As I notice somebody has *already done* in
> common/relpath.c.   I'm not exactly sure how those psprintf calls are
> working at all in frontend builds.

There's psprintf in src/common/fe_memutils.c, too, using the backend's
API, which is why this works.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Reasons not to like asprintf

From

Tom Lane

Date:

22 October 2013, 20:36:31

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> ... BTW, another reason to choose identical APIs for frontend and backend
>> versions of these functions is that it greatly eases use of them in shared
>> frontend/backend code.  As I notice somebody has *already done* in
>> common/relpath.c.   I'm not exactly sure how those psprintf calls are
>> working at all in frontend builds.

> There's psprintf in src/common/fe_memutils.c, too, using the backend's
> API, which is why this works.

Ah --- that's why it links, anyway.  As for "works", I suspect that the
answer is that anole has a vsnprintf that returns -1 on buffer overrun.
vasprintf and then psprintf will interpret that as "out of memory",
resulting in the observed behavior.  It'd be interesting to see the stack
trace from that error, though, just to confirm this theory.
        regards, tom lane

Re: Reasons not to like asprintf

From

Tom Lane

Date:

22 October 2013, 21:06:10

I wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> I agree that exit/Assert-or-elog is the right API here.  I wonder if
>> it'd be reasonable or worthwhile to try and actually make that happen-
>> that is to say, we really do only have one implementation for both
>> front-end and back-end here, but on the front-end we exit, while on the
>> back-end we elog(ERROR).  I'm guessing that's a pipe-dream,

> It's not really --- we could make it happen with a single source-file in
> libpgcommon, with a few #ifdef FRONTENDs, similarly to exec.c for
> instance.  I'm not entirely sure that it's worth the trouble, since once
> you take away the error and memory management there's not much left of
> these functions; so the #ifdefs might end up being most of the code.
> I'm going to go code it shortly, and we'll find out.

Attached is a draft, which compiles though I've not yet actually tested it.
It seems pretty reasonable as far as readability goes.  A couple of notes:

1. I omitted pvsprintf(), which we don't actually use anywhere anyway.
I don't see any way to implement that API without va_copy, which is one
of the main things I'm trying to get rid of.  Since we've never needed
a va_args variant in stringinfo.c in all the years we've had that, I
think we can get away with omitting it here.

2. The file includes utils/memutils.h, which I'm not 100% sure is safe
to include in the FRONTEND case.  This is so that it can refer to
MaxAllocSize.  If it turns out that that causes build problems on some
platforms, we could use some more ad-hoc way to define the limit (maybe
just INT_MAX/4?), or we could move the definition of MaxAllocSize into
palloc.h.

Barring objections, I'll push forward with replacing the existing code
with this.

            regards, tom lane

Attachment

psprintf.c

Re: Reasons not to like asprintf

From

Florian Weimer

Date:

23 October 2013, 07:02:38

On 10/22/2013 11:06 PM, Tom Lane wrote:

> Attached is a draft, which compiles though I've not yet actually tested it.
nprinted = vsnprintf(buf, len, fmt, args);
Assert(buf[len - 1] == '\0');

The assert may fire if len > INT_MAX and the system returns with errno 
== EOVERFLOW, as required by POSIX.  It's probably better to move it 
after the error logging.

-- 
Florian Weimer / Red Hat Product Security Team

Re: Reasons not to like asprintf

From

Peter Eisentraut

Date:

24 October 2013, 18:18:15

On 10/22/13, 3:40 PM, Tom Lane wrote:
> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
> I'm now thinking we should use exactly the same names for the frontend and
> backend versions, ie psprintf() and pvsprintf().  The main reason for
> considering a pg_ prefix for the frontend versions was to avoid cluttering
> application namespace; but it's already the case that we don't expect
> libpgcommon to be namespace clean.

While this is attractive, the same logic would suggest that we rename
pg_malloc() to palloc(), and that sounds wrong.  The frontend and
backend functions do have different freeing semantics.

Re: Reasons not to like asprintf

From

Tom Lane

Date:

24 October 2013, 18:27:04

Peter Eisentraut <peter_e@gmx.net> writes:
> On 10/22/13, 3:40 PM, Tom Lane wrote:
>> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
>> I'm now thinking we should use exactly the same names for the frontend and
>> backend versions, ie psprintf() and pvsprintf().  The main reason for
>> considering a pg_ prefix for the frontend versions was to avoid cluttering
>> application namespace; but it's already the case that we don't expect
>> libpgcommon to be namespace clean.

> While this is attractive, the same logic would suggest that we rename
> pg_malloc() to palloc(), and that sounds wrong.  The frontend and
> backend functions do have different freeing semantics.

We already crossed that bridge, though, by defining "palloc" in frontend
environments to mean pg_malloc.  I'm doubtful that insisting on different
names is going to result in anything except #ifdef clutter.
        regards, tom lane

Re: Reasons not to like asprintf

From

Robert Haas

Date:

24 October 2013, 20:53:22

On Thu, Oct 24, 2013 at 2:18 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/22/13, 3:40 PM, Tom Lane wrote:
>> In order to avoid having to clutter stuff like that with #ifdef FRONTENDs,
>> I'm now thinking we should use exactly the same names for the frontend and
>> backend versions, ie psprintf() and pvsprintf().  The main reason for
>> considering a pg_ prefix for the frontend versions was to avoid cluttering
>> application namespace; but it's already the case that we don't expect
>> libpgcommon to be namespace clean.
>
> While this is attractive, the same logic would suggest that we rename
> pg_malloc() to palloc(), and that sounds wrong.  The frontend and
> backend functions do have different freeing semantics.

I'd almost be inclined to go the other way and suggest that we try to
use the pg_ prefix more, at least for things to be shared between
front and back end code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Reasons not to like asprintf

From

Tom Lane

Date:

25 October 2013, 02:07:41

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 24, 2013 at 2:18 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
>> While this is attractive, the same logic would suggest that we rename
>> pg_malloc() to palloc(), and that sounds wrong.  The frontend and
>> backend functions do have different freeing semantics.

> I'd almost be inclined to go the other way and suggest that we try to
> use the pg_ prefix more, at least for things to be shared between
> front and back end code.

Meh.  I think that mainly promotes carpal tunnel syndrome.  The place
for a pg_ prefix is in functions we intend to expose to the "outside
world", such as functions exposed by libpq.  But these are not that.
        regards, tom lane