Thread: A couple of gripes about the gettext plurals patch

A couple of gripes about the gettext plurals patch

From
Tom Lane
Date:
I see that the recently committed changes typically use ngettext
in this style:
       ereport(msglevel,               /* translator: %d always has a value larger than 1 */
(errmsg(ngettext("dropcascades to %d other object",                                "drop cascades to %d other objects",
                              numReportedClient + numNotReportedClient),                       numReportedClient +
numNotReportedClient),

This is bogus: errmsg expects that it should itself feed its first
argument through gettext(), not receive an argument that is already
translated.  That's at the least a waste of cycles, and it's not
entirely impossible that double translation could end up with a just
plain wrong result.

A simple fix would be to use errmsg_internal() in these cases, but I
wonder if we should instead invent nerrmsg() or something like that.
Anyway, there are some other usages besides errmsg(ngettext()) that
are broken in the same way and will also require consideration.

I'm also wondering whether PGAC_CHECK_GETTEXT() should be made to
check for ngettext() instead of bind_textdomain_codeset().  Which
one was added later?
        regards, tom lane


Re: A couple of gripes about the gettext plurals patch

From
Peter Eisentraut
Date:
On Sunday 26 April 2009 21:29:20 Tom Lane wrote:
>         ereport(msglevel,
>                 /* translator: %d always has a value larger than 1 */
>                 (errmsg(ngettext("drop cascades to %d other object",
>                                  "drop cascades to %d other objects",
>                                  numReportedClient + numNotReportedClient),
>                         numReportedClient + numNotReportedClient),
>
> This is bogus: errmsg expects that it should itself feed its first
> argument through gettext(), not receive an argument that is already
> translated.  That's at the least a waste of cycles, and it's not
> entirely impossible that double translation could end up with a just
> plain wrong result.

I think we can live with this for now, and we have lived with this sort of 
issue in other places for a while.  We should consider reshuffling the 
interfaces a bit as you describe to reduce the problem.  But I think this is 
not something we always avoid completely.

> I'm also wondering whether PGAC_CHECK_GETTEXT() should be made to
> check for ngettext() instead of bind_textdomain_codeset().  Which
> one was added later?

I checked this in the gettext changelog, and bind_textdomain_code() came 
(slightly) later, so we're OK.


Re: A couple of gripes about the gettext plurals patch

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Sunday 26 April 2009 21:29:20 Tom Lane wrote:
>> This is bogus: errmsg expects that it should itself feed its first
>> argument through gettext(), not receive an argument that is already
>> translated.  That's at the least a waste of cycles, and it's not
>> entirely impossible that double translation could end up with a just
>> plain wrong result.

> I think we can live with this for now, and we have lived with this sort of 
> issue in other places for a while.  We should consider reshuffling the 
> interfaces a bit as you describe to reduce the problem.

The issue of double translation is really a minor point; what is
bothering me is that we've got such an ad-hoc,
non-compile-time-checkable approach here.  Zdenek's discovery
today that some of the format strings are flat-out wrong
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00946.php
surprises me not in the least.

I think that we need to have a more carefully designed interface
before we allow any more plural translations to be put in place,
not after.
        regards, tom lane


Re: A couple of gripes about the gettext plurals patch

From
Peter Eisentraut
Date:
On Monday 25 May 2009 22:02:47 Tom Lane wrote:
> The issue of double translation is really a minor point; what is
> bothering me is that we've got such an ad-hoc,
> non-compile-time-checkable approach here.  Zdenek's discovery
> today that some of the format strings are flat-out wrong
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00946.php
> surprises me not in the least.

See response there why this is the way it is.

Note also that gcc's format argument checking can see through ngettext() quite 
well, so as far as I can tell, we are not exposed to accidental format string 
mismatches.

Example code:

#include <stdarg.h>
#include <stdio.h>
#include <libintl.h>

extern void errmsg(const char *fmt, ...) __attribute__((format(printf,1,2)));

void
errmsg(const char *fmt, ...)
{       va_list ap;
       va_start(ap, fmt);       vfprintf(stderr, fmt, ap);       va_end(ap);
}


int
main(int argc, char *argv[])
{       errmsg(ngettext("got %d argument, namely %s\n",                       "got %d arguments, first ist %s\n",
argc),                      argc, argv[0]);       return 0;
 
}

I tried throwing various kinds of subtle garbage into the errmsg/ngettext 
line, but it was all discovered by gcc -Wall.



Re: A couple of gripes about the gettext plurals patch

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I tried throwing various kinds of subtle garbage into the errmsg/ngettext 
> line, but it was all discovered by gcc -Wall.

I experimented with this and found that indeed both format strings are
checked ... if you have a reasonably recent libintl.h AND you have
specified --enable-nls.  Otherwise it all goes to heck, apparently
because the compiler doesn't try to look through our substitute
definition

#define ngettext(s,p,n) ((n) == 1 ? (s) : (p))

So I'm still of the opinion that we need some work here.  I think
that instead of this #define we need an actual function that we can
hang a couple of __attribute_format_arg__ markers on.  Otherwise
things are going to slip by us.  (Not sure about you, but I don't
build with --enable-nls by default.)
        regards, tom lane


Re: A couple of gripes about the gettext plurals patch

From
Bruce Momjian
Date:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I tried throwing various kinds of subtle garbage into the errmsg/ngettext 
> > line, but it was all discovered by gcc -Wall.
> 
> I experimented with this and found that indeed both format strings are
> checked ... if you have a reasonably recent libintl.h AND you have
> specified --enable-nls.  Otherwise it all goes to heck, apparently
> because the compiler doesn't try to look through our substitute
> definition
> 
> #define ngettext(s,p,n) ((n) == 1 ? (s) : (p))
> 
> So I'm still of the opinion that we need some work here.  I think
> that instead of this #define we need an actual function that we can
> hang a couple of __attribute_format_arg__ markers on.  Otherwise
> things are going to slip by us.  (Not sure about you, but I don't
> build with --enable-nls by default.)

TODO item?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: A couple of gripes about the gettext plurals patch

From
Peter Eisentraut
Date:
On Tuesday 26 May 2009 21:05:29 Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I tried throwing various kinds of subtle garbage into the errmsg/ngettext
> > line, but it was all discovered by gcc -Wall.
>
> I experimented with this and found that indeed both format strings are
> checked ... if you have a reasonably recent libintl.h AND you have
> specified --enable-nls.  Otherwise it all goes to heck, apparently
> because the compiler doesn't try to look through our substitute
> definition
>
> #define ngettext(s,p,n) ((n) == 1 ? (s) : (p))

I can't reproduce that.  Do you have a concrete example?  Different compiler 
versions, perhaps?

> So I'm still of the opinion that we need some work here.  I think
> that instead of this #define we need an actual function that we can
> hang a couple of __attribute_format_arg__ markers on.

That would appear to be the logical solution, but since I can't reproduce the 
problem, I will have trouble to work on this.


Re: A couple of gripes about the gettext plurals patch

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Tuesday 26 May 2009 21:05:29 Tom Lane wrote:
>> I experimented with this and found that indeed both format strings are
>> checked ... if you have a reasonably recent libintl.h AND you have
>> specified --enable-nls.  Otherwise it all goes to heck, apparently
>> because the compiler doesn't try to look through our substitute
>> definition
>> 
>> #define ngettext(s,p,n) ((n) == 1 ? (s) : (p))

> I can't reproduce that.  Do you have a concrete example?  Different compiler 
> versions, perhaps?

Hmm, I was experimenting with Fedora 10's gcc (4.3.2) ... what are you
testing?

I don't have the test case anymore but will see if I can reconstruct it.
        regards, tom lane


Re: A couple of gripes about the gettext plurals patch

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Tuesday 26 May 2009 21:05:29 Tom Lane wrote:
>> I experimented with this and found that indeed both format strings are
>> checked ... if you have a reasonably recent libintl.h AND you have
>> specified --enable-nls.  Otherwise it all goes to heck, apparently
>> because the compiler doesn't try to look through our substitute
>> definition
>> 
>> #define ngettext(s,p,n) ((n) == 1 ? (s) : (p))

> I can't reproduce that.  Do you have a concrete example?  Different compiler 
> versions, perhaps?

After further experimentation I think I must have gotten confused
yesterday.  gcc 4.3.2 does seem to understand that it should check both
format strings in the ?: construct.  What was not making that check is
the relic 2.95.3 version that I use for trailing-edge compatibility
checking.  However, 2.95.3 appears to handle only one format_arg()
attribute per function, and so the proposed alternative coding doesn't
help it very much either.  Unless there's some intermediate version
that can do two format_arg()s but doesn't look through ?:, it's
probably not worth changing.

So on the whole I withdraw that line of complaint.  But I'm still
not happy, because I've thought of another one ;-)  To wit, the current
coding fails to respect the gettext domain when working with pluralized
messages.
        regards, tom lane


Re: A couple of gripes about the gettext plurals patch

From
Peter Eisentraut
Date:
On Thursday 28 May 2009 00:54:32 Tom Lane wrote:
> To wit, the current
> coding fails to respect the gettext domain when working with pluralized
> messages.

The ngettext() calls use the default textdomain that main.c sets up.  The PLs 
use dngettext().  Is that not correct?


Re: A couple of gripes about the gettext plurals patch

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Thursday 28 May 2009 00:54:32 Tom Lane wrote:
>> To wit, the current
>> coding fails to respect the gettext domain when working with pluralized
>> messages.

> The ngettext() calls use the default textdomain that main.c sets up.  The PLs
> use dngettext().  Is that not correct?

If that's okay, why didn't we adopt that approach for the mainline
errmsg processing?  Or more to the point: I think it's a seriously bad
idea that ereports in PLs need to be coded differently from those in
the core backend, especially with respect to a relatively-little-used
feature.  Want to make a side bet on how long till the first bug gets
committed?
        regards, tom lane