Thread: Improving the ngettext() patch

Improving the ngettext() patch

From

Tom Lane

Date:

03 June 2009, 17:50:55

After looking through the current uses of ngettext(), I think that it
wouldn't be too difficult to modify the patch to address the concerns
I had about it.  What I propose doing is to add an additional elog.h
function

errmsg_plural(const char *fmt_singular, const char *fmt_plural,             unsigned long n, ...)

and replace the current errmsg(ngettext(...)) calls with this.
Similarly add errdetail_plural to replace errdetail(ngettext(...)).
(We could also add errhint_plural and so on, but right offhand these
seem unlikely to be useful.)  The advantage of doing this is that
we avoid double translation and eliminate the current kluge whereby
usages in PL code have to be different from usages anywhere else.

I don't feel a need to touch the usages in client programs (pg_dump and
so on).  In principle the double-translation risk still exists there,
but it seems much less likely to be a real hazard because any one client
program has a *far* smaller pool of translatable messages than the
backend does.  Also, there's only one active text domain in a client
program, so the problem of needing to use dngettext in special cases
doesn't exist.

There are a few usages of ngettext() in the backend that are not tied
to ereport calls, but I think they can be left as-is.  There's no
double-translation risk, and with so few of them I don't see much of
a risk of wrongly copying the usage in PL code, either.


Also: one thought that came to me while looking at the existing usages
is that there are several places that are plural-ized that seem
completely pointless; why are we making our translators work
harder on them?  For example
       ereport(ERROR,               (errcode(ERRCODE_TOO_MANY_ARGUMENTS),                errmsg(ngettext("functions
cannothave more than %d argument",                                "functions cannot have more than %d arguments",
                        FUNC_MAX_ARGS),                       FUNC_MAX_ARGS)));
 

It seems extremely far-fetched that FUNC_MAX_ARGS would ever be small
enough that it would make any language's special cases kick in.  Or
how about this one:

#if 0   write_msg(modulename, ngettext("read %lu byte into lookahead buffer\n",                                  "read
%lubytes into lookahead buffer\n",                                  AH->lookaheadLen),             (unsigned long)
AH->lookaheadLen);
#endif

I'm not sure why this debug support is still there at all, but surely
it's a crummy candidate for making translators sweat over.  So I'd like
to revert these.

Comments, objections?
        regards, tom lane

Re: Improving the ngettext() patch

From

Sergey Burladyan

Date:

03 June 2009, 22:34:06

Tom Lane <tgl@sss.pgh.pa.us> writes:

>         ereport(ERROR,
>                 (errcode(ERRCODE_TOO_MANY_ARGUMENTS),
>                  errmsg(ngettext("functions cannot have more than %d argument",
>                                  "functions cannot have more than %d arguments",
>                                  FUNC_MAX_ARGS),
>                         FUNC_MAX_ARGS)));
> 
> It seems extremely far-fetched that FUNC_MAX_ARGS would ever be small
> enough that it would make any language's special cases kick in.

Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

-- 
Sergey Burladyan

Re: Improving the ngettext() patch

From

pg@thetdh.com

Date:

04 June 2009, 08:57:29

> Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.<br /><br />True.  The rule IIRC is that
exceptfor 11-14 and for collective numerals, declination follows the last digit.<br /><br />It would be possible to
generalizedeclination via a language-specific message-selector function, especially if the number of numerical
complementswere limited to 1.<br /><br />How awkward would it be to re-word the style of messages to avoid
declination? For example, the Russian equivalent of "X rows" could be something like "#rows -- X".<br /><br />David
Hudson<br/><br />

Re: Improving the ngettext() patch

From

Tom Lane

Date:

04 June 2009, 11:21:35

pg@thetdh.com writes:
>> Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

> True. The rule IIRC is that except for 11-14 and for collective numerals, declination follows the last digit.

Wow.  So how does anyone represent that in the .po files?  AFAICT the
notation the gettext machinery provides isn't really powerful enough
for this.
        regards, tom lane

Re: Improving the ngettext() patch

From

Aidan Van Dyk

Date:

04 June 2009, 11:45:46

* Tom Lane <tgl@sss.pgh.pa.us> [090604 10:22]:
> pg@thetdh.com writes:
> >> Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.
> 
> > True. The rule IIRC is that except for 11-14 and for collective numerals, declination follows the last digit.
> 
> Wow.  So how does anyone represent that in the .po files?  AFAICT the
> notation the gettext machinery provides isn't really powerful enough
> for this.

Well, the C/english "template" one includes just the msgid, and
msgid_plural string.

When the russian translators get to it, they make a russion .po which
contains (something like) the following in the msgid "" header:"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ?
0: n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

And then they provide msgstr[0], msgstr[1], and msgstr[2] to fill the 3
slots that above plural-forms can use when translationg plural-form
strings.

It's all encapsulated in the gettext tools and libraries, and the C
(non-translated) base just always uses ngetttext(single, plural, n), and
ngettext will (if the compiled catalog has different plural-forms) use
whatever the catalog specifies, or fall back to the simple n == 1 ? singular : plural
type choice when no translated catalog is available.

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: Improving the ngettext() patch

From

Tom Lane

Date:

04 June 2009, 11:52:36

Aidan Van Dyk <aidan@highrise.ca> writes:
> When the russian translators get to it, they make a russion .po which
> contains (something like) the following in the msgid "" header:
>     "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 :
2;\n"

Oh, I see.  I didn't realize there was a mapping mechanism available
to the translator.

Okay, so the bottom line there is that there is some value in
pluralizing the messages about FUNC_MAX_ARGS --- I withdraw the
suggestion to undo that.  Anyone wish to defend the ones that
are ifdef'd out?
        regards, tom lane

Re: Improving the ngettext() patch

From

pg@thetdh.com

Date:

05 June 2009, 10:21:38

(Grrr, declension, not declination.)<br /><br />> "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0
:n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"<br /><br />Thanks.  The above
(ignoringbackslash-EOL) is the form recommended for Russian (inter alia(s)) in the Texinfo manual for gettext ("info
gettext"). FWIW this might be an alternative:<br /><br />"Plural-Forms: nplurals=3; plural=((n - 1)  % 10) >= (5-1)
||(((n - 1) % 100) <= (14-1) && ((n - 1) % 100) >= (11 - 1)) ? 2 : ((n - 1) % 10) == (1 - 1) ? 0 :
1;\n"<br/><br />David Hudson<br /><br />