Thread: Re: [HACKERS] snprintf causes regression tests to fail

Re: [HACKERS] snprintf causes regression tests to fail

From
Joerg Hessdoerfer
Date:
Hi,

On Tuesday 01 March 2005 21:44, you wrote:
> >> I spent all day debugging it. Still have absolutely
> >> no idea what could possibly go wrong. Does
> >> anyone have a slightest clue what can it be and
> >> why it manifests itself only on win32?
> >
> >It may be that the CLIB  has badly broken support for 64bit
> >integers on 32
> >bit platforms. Does anyone know of any Cygwin/Ming issues?
> >
> >Is this only with the new snprintf code in Win32?
>
> Yes.
>
> >Is this a problem with snprintf as implemented in src/port?
>
> Yes. Only. It works with the snprintf() in the runtime (this particular
> part).
>
> >Is there a reason why we don't use the snprintf that comes with the
> >various C compilers?
>
> It does not support "positional parameters" (I think it's called) which
> is required for proper translations.
> We do use that one when it works...
>
> //Magnus

Some stupid idea just crossed my mind: what if the /ports version just
re-arranges the va_list according to the positional args and calls
vsnprintf()?
At least we know compiler and library...

Or, another idea: why not format the va_args individually using the original
format specifiers alone (without positional args), and assemble the resulting
string?

Am I on dope? Or does this sound at least doable?

Didn't code too much C lately...

Greetings,
 Jörg
--
Leading SW developer  - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW:  http://www.sea-gmbh.com

Re: [HACKERS] snprintf causes regression tests to fail

From
Nicolai Tufar
Date:
On Tue, 1 Mar 2005 22:42:52 +0100, Joerg Hessdoerfer
<Joerg.Hessdoerfer@sea-gmbh.com> wrote:
>
> Some stupid idea just crossed my mind: what if the /ports version just
> re-arranges the va_list according to the positional args and calls
> vsnprintf()?
> At least we know compiler and library...

I thought about it a lot. Some platforms do not support
all of % formant strings. src/port/snprintf.c is used both
for the platforms that do not support all necessary
% modifiers and the ones that do not support %n$
modifiers. Here is a comment from configure:

# If we found "long int" is 64 bits, assume snprintf handles it.  If
# we found we need to use "long long int", better check.  We cope with
# snprintfs that use %lld, %qd, or %I64d as the format.  If none of these
# work, fall back to our own snprintf emulation (which we know uses %lld).

Any comments on this idea? Should we have two
versions of snprintf.c for these occasions?

> Or, another idea: why not format the va_args individually using the original
> format specifiers alone (without positional args), and assemble the resulting
> string?

That is exactly what the code does :)

> Greetings,
>  Jörg

Nick

Re: [HACKERS] snprintf causes regression tests to fail

From
Tom Lane
Date:
Joerg Hessdoerfer <Joerg.Hessdoerfer@sea-gmbh.com> writes:
> Some stupid idea just crossed my mind: what if the /ports version just
> re-arranges the va_list according to the positional args and calls
> vsnprintf()?

Having looked at the current snprintf.c, I don't actually believe that
it works at all in the presence of positional parameter specs.  It picks
up the arguments in the order they are mentioned in the format string,
which is exactly not what it's supposed to do, if I understand the
concept of positional specifiers properly.  This is certain to give the
wrong answers when the arguments are of different widths.

I'm also less than thrilled with the 300K+ local array ;-).  I don't see
any point in saving the width specs from the first pass to the second
when they are not needed in the first pass.  NL_ARGMAX could have a more
sane value (surely not more than a couple hundred) and we need some
checks that it's not exceeded in any case.  On the other side of the
coin, the hardwired 4K limit in printf() is certainly *not* enough.

I think a correct implementation is probably a three-pass affair:

1. Scan format string to determine the basic datatypes (int, float, char*,
etc) of each actual parameter and their positions.  Note that runtime
width and precision specs have to be counted as actual parameters.

2. Perform the va_arg() fetches in position order, and stash the actual
parameter values into a local array.

3. Rescan format string and do the outputting.

I don't offhand see what is making the regression tests fail (it's not
the above because the problem occurs with a nonpositional format string);
but there's not much point in trying to get rid of porting issues when
the fundamental algorithm is wrong.

            regards, tom lane

Re: [HACKERS] snprintf causes regression tests to fail

From
Nicolai Tufar
Date:
> Having looked at the current snprintf.c, I don't actually believe that
> it works at all in the presence of positional parameter specs.  It picks
> up the arguments in the order they are mentioned in the format string,
> which is exactly not what it's supposed to do, if I understand the
> concept of positional specifiers properly.  This is certain to give the
> wrong answers when the arguments are of different widths.

It picks up arguments in order of appearance, places them in
array then shuffles them according to %n$ positional parameter.
I checked it with in many different combinations, it works!

> I'm also less than thrilled with the 300K+ local array ;-).  I don't see
> any point in saving the width specs from the first pass to the second
> when they are not needed in the first pass.  NL_ARGMAX could have a more
> sane value (surely not more than a couple hundred) and we need some
> checks that it's not exceeded in any case.  On the other side of the
> coin, the hardwired 4K limit in printf() is certainly *not* enough.

How would one solve this issue. Calling malloc() from a print function
would be rather expensive. Maybe call snprintf() from printf() and
see if resulting string is 4K long then allocate a new buffer and
call snprintf() again? And by the way, what should I use malloc()
or palloc()?

What would you think will be a good value for NL_ARGMAX?

> I think a correct implementation is probably a three-pass affair:
>
> 1. Scan format string to determine the basic datatypes (int, float, char*,
> etc) of each actual parameter and their positions.  Note that runtime
> width and precision specs have to be counted as actual parameters.
>
> 2. Perform the va_arg() fetches in position order, and stash the actual
> parameter values into a local array.

I actually do it. But in one step. I left the scanning loop in place
but replaced calls to actual printing functions with code to fill
the array, preserving width and precision. Plus I store pointers
to beginning and endings of format placeholders to not to have
to scan format string again in the next step.

> 3. Rescan format string and do the outputting.

My version: reorder the array and do the outputting.
/* shuffle pointers */ comment in source is misleading,
it should have been /* reorder pointers */.

> I don't offhand see what is making the regression tests fail (it's not
> the above because the problem occurs with a nonpositional format string);
> but there's not much point in trying to get rid of porting issues when
> the fundamental algorithm is wrong.

I believe my algorithm is working and it is faster than your proposed algorithm.
But if it is not acceptable I am willing to change as deemed necessary. I tested
the function separately and it passed regression tests on many platforms.
Situation with win32 is really unusual. We may need a win32 and MinGG internals
guru to have a look a the issue.

>                       regards, tom lane
Nicolai Tufar

Re: [HACKERS] snprintf causes regression tests to fail

From
Tom Lane
Date:
Nicolai Tufar <ntufar@gmail.com> writes:
>> Having looked at the current snprintf.c, I don't actually believe that
>> it works at all in the presence of positional parameter specs.

> It picks up arguments in order of appearance, places them in
> array then shuffles them according to %n$ positional parameter.
> I checked it with in many different combinations, it works!

Did you try combinations of parameters of different sizes?  For instance

    snprintf(..., "%g %d", doubleval, intval);
and
    snprintf(..., "%2$d %1$g", doubleval, intval);

You cannot pick this up in order of appearance and expect it to work.
It might fail to fail on some machine architectures, but it's not going
to be portable.

>> On the other side of the
>> coin, the hardwired 4K limit in printf() is certainly *not* enough.

> How would one solve this issue. Calling malloc() from a print function
> would be rather expensive.

printf shouldn't use a buffer at all.  I was thinking in terms of
passing around a state struct like

    typedef struct {
        char    *output;
        char    *end;
        FILE    *outfile;
    } printf_target;

and making dopr_outch look like

static void
dopr_outch(int c, printf_target *state)
{
    if (state->output)
    {
        if (state->end == NULL || state->output < state->end)
            *(state->output++) = c;
    }
    else
        fputc(c, state->outfile);
}

            regards, tom lane

Re: [HACKERS] snprintf causes regression tests to fail

From
Nicolai Tufar
Date:
On Tue, 01 Mar 2005 18:15:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Nicolai Tufar <ntufar@gmail.com> writes:
> >> Having looked at the current snprintf.c, I don't actually believe that
> >> it works at all in the presence of positional parameter specs.
>
> > It picks up arguments in order of appearance, places them in
> > array then shuffles them according to %n$ positional parameter.
> > I checked it with in many different combinations, it works!
>
> Did you try combinations of parameters of different sizes?  For instance
>
>         snprintf(..., "%g %d", doubleval, intval);
> and
>         snprintf(..., "%2$d %1$g", doubleval, intval);
>

It works perfectly fine. Just checked.

> You cannot pick this up in order of appearance and expect it to work.
> It might fail to fail on some machine architectures, but it's not going
> to be portable.

I did not modify the code that picked up the values.
I just modified it to instead of passing to printing function
store values in array.

> >> On the other side of the
> >> coin, the hardwired 4K limit in printf() is certainly *not* enough.
>
> > How would one solve this issue. Calling malloc() from a print function
> > would be rather expensive.
>
> printf shouldn't use a buffer at all.  I was thinking in terms of
> passing around a state struct like
>
>         typedef struct {
>                 char    *output;
>                 char    *end;
>                 FILE    *outfile;
>         } printf_target;
>
> and making dopr_outch look like
>
> static void
> dopr_outch(int c, printf_target *state)
> {
>         if (state->output)
>         {
>                 if (state->end == NULL || state->output < state->end)
>                         *(state->output++) = c;
>         }
>         else
>                 fputc(c, state->outfile);
> }

I will consider it tomorrow. It is too late on this side of
the pond, I need to sleep. Thank you for your attention
and help.

best regards,
Nicolai Tufar

Re: [HACKERS] snprintf causes regression tests to fail

From
Tom Lane
Date:
Nicolai Tufar <ntufar@gmail.com> writes:
> On Tue, 01 Mar 2005 18:15:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Did you try combinations of parameters of different sizes?  For instance
>>
>> snprintf(..., "%g %d", doubleval, intval);
>> and
>> snprintf(..., "%2$d %1$g", doubleval, intval);

> It works perfectly fine. Just checked.

Well, whatever architecture you're checking on may happen to make it
work, but that does *not* mean it's portable.  On my machine it fails:

int main()
{
    char buf[1000];
    double d = 3.14159265358979;
    int i = 42;

    snprintf(buf, sizeof buf, "%.15g %d", d, i);
    printf("result = '%s'\n", buf);
    snprintf(buf, sizeof buf, "%2$d %1$.15g", d, i);
    printf("result = '%s'\n", buf);
    return 0;
}

"Correct" output with the system snprintf is

result = '3.14159265358979 42'
result = '42 3.14159265358979'

With CVS-tip snprintf I get

result = '3 42'
result = '3 3505'

The second value in the last line varies erratically from run to run,
presumably because it's picking up garbage.  This output appears to
indicate some problems in passing around precision specs as well as the
values themselves...

            regards, tom lane

Re: [HACKERS] snprintf causes regression tests to fail

From
Tom Lane
Date:
BTW, you should read the official spec for snprintf:

http://www.opengroup.org/onlinepubs/007908799/xsh/fprintf.html

There are a couple of interesting things that the present code is most
certainly not doing correctly, notably:

     In format strings containing the %n$ form of conversion
     specifications, numbered arguments in the argument list can be
     referenced from the format string as many times as required.

Also it seems that runtime precision specs are required to have explicit
numbers when used in a %n$ string, which is something I didn't know
until just now.  This example in the spec is instructive:
    printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);

It might be a good idea to go look at whichever *BSD we got this code
from originally, and see if they've upgraded it to do %n$.  'Cause it
will take a nontrivial amount of work to get from where we are now to
something that follows the full spec.

            regards, tom lane

Re: [HACKERS] snprintf causes regression tests to fail

From
Joerg Hessdoerfer
Date:
Hi,

On Wednesday 02 March 2005 01:28, you wrote:
> BTW, you should read the official spec for snprintf:
>
> http://www.opengroup.org/onlinepubs/007908799/xsh/fprintf.html
>
> There are a couple of interesting things that the present code is most
> certainly not doing correctly, notably:
>
>      In format strings containing the %n$ form of conversion
>      specifications, numbered arguments in the argument list can be
>      referenced from the format string as many times as required.
>
> Also it seems that runtime precision specs are required to have explicit
> numbers when used in a %n$ string, which is something I didn't know
> until just now.  This example in the spec is instructive:
>  printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec);
>
> It might be a good idea to go look at whichever *BSD we got this code
> from originally, and see if they've upgraded it to do %n$.  'Cause it
> will take a nontrivial amount of work to get from where we are now to
> something that follows the full spec.
>

don't know if PG borrowed the code from here, but according to the manpage
FreeBSD 5.3 seems to have a quite complete implementation, see
http://www.freebsd.org/cgi/man.cgi?query=snprintf&apropos=0&sektion=3&manpath=FreeBSD+5.3-RELEASE+and+Ports&format=html



Would this help?

Greetings,
 Jörg
--
Leading SW developer  - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW:  http://www.sea-gmbh.com