Thread: problem with plural-forms

problem with plural-forms

From

Zdenek Kotala

Date:

25 May 2009, 13:10:23

I tried to run msgfmt -v ... on solaris and I got following error:

Processing file "psql-cs.po"...
GNU PO file found.
Generating the MO file in the GNU MO format.
Processing file "psql-cs.po"...
Lines 1311, 1312 (psql-cs.po): incompatible printf-format.    0 format specifier(s) in "msgid", but 1 format
specifier(s)in "msgstr".
 
...
...

Problem is in:

#: print.c:2351
#, c-format
msgid "(1 row)"
msgid_plural "(%lu rows)"
msgstr[0] "(%lu řádka)"
msgstr[1] "(%lu řádky)"
msgstr[2] "(%lu řádek)"


The problem here is (1 row) instead of (%lu row). When I run msgfmt
without -v everything works fine but I think we should fixed it (there
are more occurrences of this issue).
    Zdenek

Re: problem with plural-forms

From

Peter Eisentraut

Date:

26 May 2009, 07:39:18

On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote:
> I tried to run msgfmt -v ... on solaris and I got following error:
>
> Processing file "psql-cs.po"...
> GNU PO file found.
> Generating the MO file in the GNU MO format.
> Processing file "psql-cs.po"...
> Lines 1311, 1312 (psql-cs.po): incompatible printf-format.
>      0 format specifier(s) in "msgid", but 1 format specifier(s) in
> "msgstr". ...
> ...
>
> Problem is in:
>
> #: print.c:2351
> #, c-format
> msgid "(1 row)"
> msgid_plural "(%lu rows)"
> msgstr[0] "(%lu řádka)"
> msgstr[1] "(%lu řádky)"
> msgstr[2] "(%lu řádek)"
>
>
> The problem here is (1 row) instead of (%lu row). When I run msgfmt
> without -v everything works fine but I think we should fixed it (there
> are more occurrences of this issue).

GNU gettext accepts this, and in fact the GNU gettext documentation explicitly
points out that this allowed:

"""    In the English singular case, the number - always 1 - can be    replaced with "one":
         printf (ngettext ("One file removed", "%d files removed", n), n);
    This works because the `printf' function discards excess arguments    that are not consumed by the format string.
"""

One might consider this better style (English style, not C style) in some
contexts.

Of course the concrete example that you show doesn't actually take advantage
of this, so if it is important to you, please send a patch to fix it.

Re: problem with plural-forms

From

Zdenek Kotala

Date:

26 May 2009, 10:00:59

Peter Eisentraut píše v út 26. 05. 2009 v 13:39 +0300:
> On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote:
<snip>
> >
> > The problem here is (1 row) instead of (%lu row). When I run msgfmt
> > without -v everything works fine but I think we should fixed it (there
> > are more occurrences of this issue).
> 
> GNU gettext accepts this, and in fact the GNU gettext documentation explicitly 
> points out that this allowed:
> 
> """
>      In the English singular case, the number - always 1 - can be
>      replaced with "one":
> 
>           printf (ngettext ("One file removed", "%d files removed", n), n);
> 
>      This works because the `printf' function discards excess arguments
>      that are not consumed by the format string.
> """

Yeah, I check also printf specification and it is allowed.

> One might consider this better style (English style, not C style) in some 
> contexts.
>
> Of course the concrete example that you show doesn't actually take advantage 
> of this, so if it is important to you, please send a patch to fix it.

It is not a big issue, because it works without -v but I prefer to fix
it. I will send a patch. I also sended question to i18n solaris group if
it is supported on solaris.
thanks Zdenek

Re: problem with plural-forms

From

Tom Lane

Date:

26 May 2009, 10:47:52

Peter Eisentraut <peter_e@gmx.net> writes:
> On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote:
>> The problem here is (1 row) instead of (%lu row). When I run msgfmt
>> without -v everything works fine but I think we should fixed it (there
>> are more occurrences of this issue).

> GNU gettext accepts this, and in fact the GNU gettext documentation explicitly 
> points out that this allowed:

>      In the English singular case, the number - always 1 - can be
>      replaced with "one":

>           printf (ngettext ("One file removed", "%d files removed", n), n);

>      This works because the `printf' function discards excess arguments
>      that are not consumed by the format string.

That advice is, if not outright wrong, at least incredibly
short-sighted.  The method breaks the instant you have any additional
values to print.  For example, this ain't gonna work:
      printf (ngettext ("One file removed, containing %lu bytes",                        "%d files removed, containing
%lubytes", n),              n, total_bytes);
 

I'm of the opinion that the test being performed by msgfmt -v is
entirely reasonable, and we should not risk such problems for the sake
of sometimes spelling out "one".
        regards, tom lane

Re: problem with plural-forms

From

Alvaro Herrera

Date:

26 May 2009, 11:05:47

Tom Lane wrote:

> That advice is, if not outright wrong, at least incredibly
> short-sighted.  The method breaks the instant you have any additional
> values to print.  For example, this ain't gonna work:
> 
>        printf (ngettext ("One file removed, containing %lu bytes",
>                          "%d files removed, containing %lu bytes", n),
>                n, total_bytes);

I think it should use the %2$s style specifier in that case.  This
should work:

>        printf (ngettext ("One file removed, containing %2$lu bytes",
>                          "%d files removed, containing %lu bytes", n),
>                n, total_bytes);

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: problem with plural-forms

From

Greg Stark

Date:

26 May 2009, 11:14:45

Isn't case I think in these two cases that using "one" is actively a  
bad idea. These aren't English sentences they're fragments meant to  
report numerical results to programmers. We don't use "two" or "three"  
either.

If the value were just part of some full sentence where the actual  
value wasn't the key piece of data such as some error messages the  
situation might be different.

-- 
Greg

On 26 May 2009, at 15:05, Alvaro Herrera <alvherre@commandprompt.com>  
wrote:

> Tom Lane wrote:
>
>> That advice is, if not outright wrong, at least incredibly
>> short-sighted.  The method breaks the instant you have any additional
>> values to print.  For example, this ain't gonna work:
>>
>>       printf (ngettext ("One file removed, containing %lu bytes",
>>                         "%d files removed, containing %lu bytes", n),
>>               n, total_bytes);
>
> I think it should use the %2$s style specifier in that case.  This
> should work:
>
>>       printf (ngettext ("One file removed, containing %2$lu bytes",
>>                         "%d files removed, containing %lu bytes", n),
>>               n, total_bytes);
>
> -- 
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: problem with plural-forms

From

Tom Lane

Date:

26 May 2009, 11:19:57

Alvaro Herrera <alvherre@commandprompt.com> writes:
> I think it should use the %2$s style specifier in that case.  This
> should work:

> printf (ngettext ("One file removed, containing %2$lu bytes",
>                   "%d files removed, containing %lu bytes", n),
>         n, total_bytes);

How's that gonna work?  In the n=1 case, printf would have no idea about
the type/size of the argument it would need to skip over.

I think maybe you could make it work like this:
      printf (ngettext ("One file removed, containing %1$lu bytes",                        "%2$d files removed,
containing%1$lu bytes", n),              total_bytes, n);
 

but *for sure* I don't want us playing such games without a robust
compile-time check on both variants of the ngettext string.  I'm
not real sure it's a good idea at all, because of the potential for
confusing translators.  Notice also that we have subtly embedded the
preferred English phrase ordering here: if someone wants to pull the
same type of trick in a language where the bytecount ought to come
first, he's just plain out of luck.
        regards, tom lane

Re: problem with plural-forms

From

Alvaro Herrera

Date:

26 May 2009, 11:27:07

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I think it should use the %2$s style specifier in that case.  This
> > should work:
> 
> > printf (ngettext ("One file removed, containing %2$lu bytes",
> >                   "%d files removed, containing %lu bytes", n),
> >         n, total_bytes);
> 
> How's that gonna work?  In the n=1 case, printf would have no idea about
> the type/size of the argument it would need to skip over.

Hmm, I admit I have no idea how it works ... but now that I think about
it, you are right that at least I only use it with the whole argument
array, just in a different order.

> I think maybe you could make it work like this:
> 
>        printf (ngettext ("One file removed, containing %1$lu bytes",
>                          "%2$d files removed, containing %1$lu bytes", n),
>                total_bytes, n);
> 
> but *for sure* I don't want us playing such games without a robust
> compile-time check on both variants of the ngettext string.  I'm
> not real sure it's a good idea at all, because of the potential for
> confusing translators.  Notice also that we have subtly embedded the
> preferred English phrase ordering here: if someone wants to pull the
> same type of trick in a language where the bytecount ought to come
> first, he's just plain out of luck.

Agreed on both counts.  We have enough trouble finding translators as it
is; I don't want to know what would happen if we were to confuse them
with this :-)

I find it strange that this topic has not been fully hashed out in the
GNU gettext documentation.  Maybe we should talk to them.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: problem with plural-forms

From

Tom Lane

Date:

26 May 2009, 11:29:24

I wrote:
> ... Notice also that we have subtly embedded the
> preferred English phrase ordering here: if someone wants to pull the
> same type of trick in a language where the bytecount ought to come
> first, he's just plain out of luck.

Uh, scratch that [ not enough caffeine yet ].  What this coding embeds
is the assumption that the filecount is the only variable we might wish
to replace with a constant string, which is safe enough since that's the
only one that we know a fixed value for in any one ngettext string.

Still, I agree with Greg's opinion that this is just not a real good
thing to be doing.
        regards, tom lane

Re: problem with plural-forms

From

Peter Eisentraut

Date:

26 May 2009, 11:33:04

On Tuesday 26 May 2009 17:19:50 Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I think it should use the %2$s style specifier in that case.  This
> > should work:
> >
> > printf (ngettext ("One file removed, containing %2$lu bytes",
> >                   "%d files removed, containing %lu bytes", n),
> >         n, total_bytes);
>
> How's that gonna work?  In the n=1 case, printf would have no idea about
> the type/size of the argument it would need to skip over.

gcc -Wall actually warns if you do this.

Re: problem with plural-forms

From

Aidan Van Dyk

Date:

26 May 2009, 11:36:09

* Alvaro Herrera <alvherre@commandprompt.com> [090526 10:06]:
> Tom Lane wrote:
>
> > That advice is, if not outright wrong, at least incredibly
> > short-sighted.  The method breaks the instant you have any additional
> > values to print.  For example, this ain't gonna work:
> >
> >        printf (ngettext ("One file removed, containing %lu bytes",
> >                          "%d files removed, containing %lu bytes", n),
> >                n, total_bytes);
>
> I think it should use the %2$s style specifier in that case.  This
> should work:
>
> >        printf (ngettext ("One file removed, containing %2$lu bytes",
> >                          "%d files removed, containing %lu bytes", n),
> >                n, total_bytes);

From the glibc printf man page:  "There  may  be no gaps in the numbers of arguments specified using   '$'; for
example,if arguments 1 and 3 are specified, argument 2 must   also be specified somewhere in the format string." 

So, is skipping 1 allowed?

But, it *is* a commonly used form, especially in translations (where
orders of things need to be flipped), and is already used in many of the
translated PG .po files.

That said, I do think the "msgid" should be using the % args, not words
for a few reasons:
1) Make it more clear for translators the arguments and their ordering  without having to visit the source code
2) On crufty systems without gettext, I wouldn't expect them to support m$  modifiers then either...
3) Greg's "these are numbers, not sentences" is how I expect the system  to work...

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: problem with plural-forms

From

Peter Eisentraut

Date:

26 May 2009, 11:41:19

On Tuesday 26 May 2009 16:47:44 Tom Lane wrote:
> The method breaks the instant you have any additional
> values to print.  For example, this ain't gonna work:
>
>        printf (ngettext ("One file removed, containing %lu bytes",
>                          "%d files removed, containing %lu bytes", n),
>                n, total_bytes);

Don't do that then.  This only shows that you cannot implement everything this 
way.  It does not show why the things that you can implement are wrong.

> I'm of the opinion that the test being performed by msgfmt -v is
> entirely reasonable, and we should not risk such problems for the sake
> of sometimes spelling out "one".

I have no objections to this.  I am only pointing out how we arrived at the 
current state.

Re: problem with plural-forms

From

Tom Lane

Date:

26 May 2009, 11:55:46

Aidan Van Dyk <aidan@highrise.ca> writes:
> From the glibc printf man page:
>    "There  may  be no gaps in the numbers of arguments specified using
>     '$'; for example, if arguments 1 and 3 are specified, argument 2 must
>     also be specified somewhere in the format string."

> So, is skipping 1 allowed?

No --- the point is that printf has to be able to figure out where each
argument is on the stack, so it must be able to infer the size of each
of the arguments from left to right.

> That said, I do think the "msgid" should be using the % args, not words
> for a few reasons:
> 1) Make it more clear for translators the arguments and their ordering
>    without having to visit the source code
> 2) On crufty systems without gettext, I wouldn't expect them to support m$
>    modifiers then either...
> 3) Greg's "these are numbers, not sentences" is how I expect the system
>    to work...

Actually, configure checks to see if the local printf supports m$ or
not, and we use our own printf implementation if not.  So I'm not
worried about #2.  I agree with your other points though.

(So, if you wanna see how this is done, try src/port/snprintf.c)
        regards, tom lane

Re: problem with plural-forms

From

Aidan Van Dyk

Date:

26 May 2009, 12:02:46

* Tom Lane <tgl@sss.pgh.pa.us> [090526 10:56]:
> Actually, configure checks to see if the local printf supports m$ or
> not, and we use our own printf implementation if not.  So I'm not
> worried about #2.  I agree with your other points though.
> 
> (So, if you wanna see how this is done, try src/port/snprintf.c)
> 
>             regards, tom lane

So what part of a working libc does PG use that it *doesn't* have to
carry around in src/port/?

;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Re: problem with plural-forms

From

Zdenek Kotala

Date:

27 May 2009, 17:01:36

Peter Eisentraut píše v út 26. 05. 2009 v 13:39 +0300:

> Of course the concrete example that you show doesn't actually take advantage
> of this, so if it is important to you, please send a patch to fix it.

Fix attached. I found only two problems, both in psql. I did not fix .po
files. Is necessary to fix them manually or do you regenerate files?

    thanks Zdenek

Attachment

i18n.patch

Re: problem with plural-forms

From

Peter Eisentraut

Date:

27 May 2009, 17:15:09

On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote:
> The problem here is (1 row) instead of (%lu row). When I run msgfmt
> without -v everything works fine but I think we should fixed it (there
> are more occurrences of this issue).

I don't think we can find all these occurrences without the Solaris version of 
msgfmt.  So please send a complete error log over all files, or better yet a 
patch.

Re: problem with plural-forms

From

Zdenek Kotala

Date:

27 May 2009, 18:05:56

Here is output of:

for FILE in `find . -name *.po`;do LC_ALL=C msgfmt -v -o /dev/null $FILE
2>> msgfmt.txt; done

    Zdenek

Peter Eisentraut píše v st 27. 05. 2009 v 23:08 +0300:
> On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote:
> > The problem here is (1 row) instead of (%lu row). When I run msgfmt
> > without -v everything works fine but I think we should fixed it (there
> > are more occurrences of this issue).
>
> I don't think we can find all these occurrences without the Solaris version of
> msgfmt.  So please send a complete error log over all files, or better yet a
> patch.

Attachment

msgfmt.txt

Re: problem with plural-forms

From

Peter Eisentraut

Date:

28 May 2009, 03:06:34

On Wednesday 27 May 2009 23:02:19 Zdenek Kotala wrote:
> Peter Eisentraut píše v út 26. 05. 2009 v 13:39 +0300:
> > Of course the concrete example that you show doesn't actually take
> > advantage of this, so if it is important to you, please send a patch to
> > fix it.
>
> Fix attached. I found only two problems, both in psql. I did not fix .po
> files. Is necessary to fix them manually or do you regenerate files?

fixed