Thread: Re: [HACKERS] snprintf causes regression tests to fail
Hi, On Tuesday 01 March 2005 21:44, you wrote: > >> I spent all day debugging it. Still have absolutely > >> no idea what could possibly go wrong. Does > >> anyone have a slightest clue what can it be and > >> why it manifests itself only on win32? > > > >It may be that the CLIB has badly broken support for 64bit > >integers on 32 > >bit platforms. Does anyone know of any Cygwin/Ming issues? > > > >Is this only with the new snprintf code in Win32? > > Yes. > > >Is this a problem with snprintf as implemented in src/port? > > Yes. Only. It works with the snprintf() in the runtime (this particular > part). > > >Is there a reason why we don't use the snprintf that comes with the > >various C compilers? > > It does not support "positional parameters" (I think it's called) which > is required for proper translations. > We do use that one when it works... > > //Magnus Some stupid idea just crossed my mind: what if the /ports version just re-arranges the va_list according to the positional args and calls vsnprintf()? At least we know compiler and library... Or, another idea: why not format the va_args individually using the original format specifiers alone (without positional args), and assemble the resulting string? Am I on dope? Or does this sound at least doable? Didn't code too much C lately... Greetings, Jörg -- Leading SW developer - S.E.A GmbH Mail: joerg.hessdoerfer@sea-gmbh.com WWW: http://www.sea-gmbh.com
On Tue, 1 Mar 2005 22:42:52 +0100, Joerg Hessdoerfer <Joerg.Hessdoerfer@sea-gmbh.com> wrote: > > Some stupid idea just crossed my mind: what if the /ports version just > re-arranges the va_list according to the positional args and calls > vsnprintf()? > At least we know compiler and library... I thought about it a lot. Some platforms do not support all of % formant strings. src/port/snprintf.c is used both for the platforms that do not support all necessary % modifiers and the ones that do not support %n$ modifiers. Here is a comment from configure: # If we found "long int" is 64 bits, assume snprintf handles it. If # we found we need to use "long long int", better check. We cope with # snprintfs that use %lld, %qd, or %I64d as the format. If none of these # work, fall back to our own snprintf emulation (which we know uses %lld). Any comments on this idea? Should we have two versions of snprintf.c for these occasions? > Or, another idea: why not format the va_args individually using the original > format specifiers alone (without positional args), and assemble the resulting > string? That is exactly what the code does :) > Greetings, > Jörg Nick
Joerg Hessdoerfer <Joerg.Hessdoerfer@sea-gmbh.com> writes: > Some stupid idea just crossed my mind: what if the /ports version just > re-arranges the va_list according to the positional args and calls > vsnprintf()? Having looked at the current snprintf.c, I don't actually believe that it works at all in the presence of positional parameter specs. It picks up the arguments in the order they are mentioned in the format string, which is exactly not what it's supposed to do, if I understand the concept of positional specifiers properly. This is certain to give the wrong answers when the arguments are of different widths. I'm also less than thrilled with the 300K+ local array ;-). I don't see any point in saving the width specs from the first pass to the second when they are not needed in the first pass. NL_ARGMAX could have a more sane value (surely not more than a couple hundred) and we need some checks that it's not exceeded in any case. On the other side of the coin, the hardwired 4K limit in printf() is certainly *not* enough. I think a correct implementation is probably a three-pass affair: 1. Scan format string to determine the basic datatypes (int, float, char*, etc) of each actual parameter and their positions. Note that runtime width and precision specs have to be counted as actual parameters. 2. Perform the va_arg() fetches in position order, and stash the actual parameter values into a local array. 3. Rescan format string and do the outputting. I don't offhand see what is making the regression tests fail (it's not the above because the problem occurs with a nonpositional format string); but there's not much point in trying to get rid of porting issues when the fundamental algorithm is wrong. regards, tom lane
> Having looked at the current snprintf.c, I don't actually believe that > it works at all in the presence of positional parameter specs. It picks > up the arguments in the order they are mentioned in the format string, > which is exactly not what it's supposed to do, if I understand the > concept of positional specifiers properly. This is certain to give the > wrong answers when the arguments are of different widths. It picks up arguments in order of appearance, places them in array then shuffles them according to %n$ positional parameter. I checked it with in many different combinations, it works! > I'm also less than thrilled with the 300K+ local array ;-). I don't see > any point in saving the width specs from the first pass to the second > when they are not needed in the first pass. NL_ARGMAX could have a more > sane value (surely not more than a couple hundred) and we need some > checks that it's not exceeded in any case. On the other side of the > coin, the hardwired 4K limit in printf() is certainly *not* enough. How would one solve this issue. Calling malloc() from a print function would be rather expensive. Maybe call snprintf() from printf() and see if resulting string is 4K long then allocate a new buffer and call snprintf() again? And by the way, what should I use malloc() or palloc()? What would you think will be a good value for NL_ARGMAX? > I think a correct implementation is probably a three-pass affair: > > 1. Scan format string to determine the basic datatypes (int, float, char*, > etc) of each actual parameter and their positions. Note that runtime > width and precision specs have to be counted as actual parameters. > > 2. Perform the va_arg() fetches in position order, and stash the actual > parameter values into a local array. I actually do it. But in one step. I left the scanning loop in place but replaced calls to actual printing functions with code to fill the array, preserving width and precision. Plus I store pointers to beginning and endings of format placeholders to not to have to scan format string again in the next step. > 3. Rescan format string and do the outputting. My version: reorder the array and do the outputting. /* shuffle pointers */ comment in source is misleading, it should have been /* reorder pointers */. > I don't offhand see what is making the regression tests fail (it's not > the above because the problem occurs with a nonpositional format string); > but there's not much point in trying to get rid of porting issues when > the fundamental algorithm is wrong. I believe my algorithm is working and it is faster than your proposed algorithm. But if it is not acceptable I am willing to change as deemed necessary. I tested the function separately and it passed regression tests on many platforms. Situation with win32 is really unusual. We may need a win32 and MinGG internals guru to have a look a the issue. > regards, tom lane Nicolai Tufar
Nicolai Tufar <ntufar@gmail.com> writes: >> Having looked at the current snprintf.c, I don't actually believe that >> it works at all in the presence of positional parameter specs. > It picks up arguments in order of appearance, places them in > array then shuffles them according to %n$ positional parameter. > I checked it with in many different combinations, it works! Did you try combinations of parameters of different sizes? For instance snprintf(..., "%g %d", doubleval, intval); and snprintf(..., "%2$d %1$g", doubleval, intval); You cannot pick this up in order of appearance and expect it to work. It might fail to fail on some machine architectures, but it's not going to be portable. >> On the other side of the >> coin, the hardwired 4K limit in printf() is certainly *not* enough. > How would one solve this issue. Calling malloc() from a print function > would be rather expensive. printf shouldn't use a buffer at all. I was thinking in terms of passing around a state struct like typedef struct { char *output; char *end; FILE *outfile; } printf_target; and making dopr_outch look like static void dopr_outch(int c, printf_target *state) { if (state->output) { if (state->end == NULL || state->output < state->end) *(state->output++) = c; } else fputc(c, state->outfile); } regards, tom lane
On Tue, 01 Mar 2005 18:15:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Nicolai Tufar <ntufar@gmail.com> writes: > >> Having looked at the current snprintf.c, I don't actually believe that > >> it works at all in the presence of positional parameter specs. > > > It picks up arguments in order of appearance, places them in > > array then shuffles them according to %n$ positional parameter. > > I checked it with in many different combinations, it works! > > Did you try combinations of parameters of different sizes? For instance > > snprintf(..., "%g %d", doubleval, intval); > and > snprintf(..., "%2$d %1$g", doubleval, intval); > It works perfectly fine. Just checked. > You cannot pick this up in order of appearance and expect it to work. > It might fail to fail on some machine architectures, but it's not going > to be portable. I did not modify the code that picked up the values. I just modified it to instead of passing to printing function store values in array. > >> On the other side of the > >> coin, the hardwired 4K limit in printf() is certainly *not* enough. > > > How would one solve this issue. Calling malloc() from a print function > > would be rather expensive. > > printf shouldn't use a buffer at all. I was thinking in terms of > passing around a state struct like > > typedef struct { > char *output; > char *end; > FILE *outfile; > } printf_target; > > and making dopr_outch look like > > static void > dopr_outch(int c, printf_target *state) > { > if (state->output) > { > if (state->end == NULL || state->output < state->end) > *(state->output++) = c; > } > else > fputc(c, state->outfile); > } I will consider it tomorrow. It is too late on this side of the pond, I need to sleep. Thank you for your attention and help. best regards, Nicolai Tufar
Nicolai Tufar <ntufar@gmail.com> writes: > On Tue, 01 Mar 2005 18:15:49 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Did you try combinations of parameters of different sizes? For instance >> >> snprintf(..., "%g %d", doubleval, intval); >> and >> snprintf(..., "%2$d %1$g", doubleval, intval); > It works perfectly fine. Just checked. Well, whatever architecture you're checking on may happen to make it work, but that does *not* mean it's portable. On my machine it fails: int main() { char buf[1000]; double d = 3.14159265358979; int i = 42; snprintf(buf, sizeof buf, "%.15g %d", d, i); printf("result = '%s'\n", buf); snprintf(buf, sizeof buf, "%2$d %1$.15g", d, i); printf("result = '%s'\n", buf); return 0; } "Correct" output with the system snprintf is result = '3.14159265358979 42' result = '42 3.14159265358979' With CVS-tip snprintf I get result = '3 42' result = '3 3505' The second value in the last line varies erratically from run to run, presumably because it's picking up garbage. This output appears to indicate some problems in passing around precision specs as well as the values themselves... regards, tom lane
BTW, you should read the official spec for snprintf: http://www.opengroup.org/onlinepubs/007908799/xsh/fprintf.html There are a couple of interesting things that the present code is most certainly not doing correctly, notably: In format strings containing the %n$ form of conversion specifications, numbered arguments in the argument list can be referenced from the format string as many times as required. Also it seems that runtime precision specs are required to have explicit numbers when used in a %n$ string, which is something I didn't know until just now. This example in the spec is instructive: printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); It might be a good idea to go look at whichever *BSD we got this code from originally, and see if they've upgraded it to do %n$. 'Cause it will take a nontrivial amount of work to get from where we are now to something that follows the full spec. regards, tom lane
Hi, On Wednesday 02 March 2005 01:28, you wrote: > BTW, you should read the official spec for snprintf: > > http://www.opengroup.org/onlinepubs/007908799/xsh/fprintf.html > > There are a couple of interesting things that the present code is most > certainly not doing correctly, notably: > > In format strings containing the %n$ form of conversion > specifications, numbered arguments in the argument list can be > referenced from the format string as many times as required. > > Also it seems that runtime precision specs are required to have explicit > numbers when used in a %n$ string, which is something I didn't know > until just now. This example in the spec is instructive: > printf("%1$d:%2$.*3$d:%4$.*3$d\n", hour, min, precision, sec); > > It might be a good idea to go look at whichever *BSD we got this code > from originally, and see if they've upgraded it to do %n$. 'Cause it > will take a nontrivial amount of work to get from where we are now to > something that follows the full spec. > don't know if PG borrowed the code from here, but according to the manpage FreeBSD 5.3 seems to have a quite complete implementation, see http://www.freebsd.org/cgi/man.cgi?query=snprintf&apropos=0&sektion=3&manpath=FreeBSD+5.3-RELEASE+and+Ports&format=html Would this help? Greetings, Jörg -- Leading SW developer - S.E.A GmbH Mail: joerg.hessdoerfer@sea-gmbh.com WWW: http://www.sea-gmbh.com