Thread: problem with plural-forms
I tried to run msgfmt -v ... on solaris and I got following error: Processing file "psql-cs.po"... GNU PO file found. Generating the MO file in the GNU MO format. Processing file "psql-cs.po"... Lines 1311, 1312 (psql-cs.po): incompatible printf-format. 0 format specifier(s) in "msgid", but 1 format specifier(s)in "msgstr". ... ... Problem is in: #: print.c:2351 #, c-format msgid "(1 row)" msgid_plural "(%lu rows)" msgstr[0] "(%lu řádka)" msgstr[1] "(%lu řádky)" msgstr[2] "(%lu řádek)" The problem here is (1 row) instead of (%lu row). When I run msgfmt without -v everything works fine but I think we should fixed it (there are more occurrences of this issue). Zdenek
On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote: > I tried to run msgfmt -v ... on solaris and I got following error: > > Processing file "psql-cs.po"... > GNU PO file found. > Generating the MO file in the GNU MO format. > Processing file "psql-cs.po"... > Lines 1311, 1312 (psql-cs.po): incompatible printf-format. > 0 format specifier(s) in "msgid", but 1 format specifier(s) in > "msgstr". ... > ... > > Problem is in: > > #: print.c:2351 > #, c-format > msgid "(1 row)" > msgid_plural "(%lu rows)" > msgstr[0] "(%lu řádka)" > msgstr[1] "(%lu řádky)" > msgstr[2] "(%lu řádek)" > > > The problem here is (1 row) instead of (%lu row). When I run msgfmt > without -v everything works fine but I think we should fixed it (there > are more occurrences of this issue). GNU gettext accepts this, and in fact the GNU gettext documentation explicitly points out that this allowed: """ In the English singular case, the number - always 1 - can be replaced with "one": printf (ngettext ("One file removed", "%d files removed", n), n); This works because the `printf' function discards excess arguments that are not consumed by the format string. """ One might consider this better style (English style, not C style) in some contexts. Of course the concrete example that you show doesn't actually take advantage of this, so if it is important to you, please send a patch to fix it.
Peter Eisentraut píše v út 26. 05. 2009 v 13:39 +0300: > On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote: <snip> > > > > The problem here is (1 row) instead of (%lu row). When I run msgfmt > > without -v everything works fine but I think we should fixed it (there > > are more occurrences of this issue). > > GNU gettext accepts this, and in fact the GNU gettext documentation explicitly > points out that this allowed: > > """ > In the English singular case, the number - always 1 - can be > replaced with "one": > > printf (ngettext ("One file removed", "%d files removed", n), n); > > This works because the `printf' function discards excess arguments > that are not consumed by the format string. > """ Yeah, I check also printf specification and it is allowed. > One might consider this better style (English style, not C style) in some > contexts. > > Of course the concrete example that you show doesn't actually take advantage > of this, so if it is important to you, please send a patch to fix it. It is not a big issue, because it works without -v but I prefer to fix it. I will send a patch. I also sended question to i18n solaris group if it is supported on solaris. thanks Zdenek
Peter Eisentraut <peter_e@gmx.net> writes: > On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote: >> The problem here is (1 row) instead of (%lu row). When I run msgfmt >> without -v everything works fine but I think we should fixed it (there >> are more occurrences of this issue). > GNU gettext accepts this, and in fact the GNU gettext documentation explicitly > points out that this allowed: > In the English singular case, the number - always 1 - can be > replaced with "one": > printf (ngettext ("One file removed", "%d files removed", n), n); > This works because the `printf' function discards excess arguments > that are not consumed by the format string. That advice is, if not outright wrong, at least incredibly short-sighted. The method breaks the instant you have any additional values to print. For example, this ain't gonna work: printf (ngettext ("One file removed, containing %lu bytes", "%d files removed, containing %lubytes", n), n, total_bytes); I'm of the opinion that the test being performed by msgfmt -v is entirely reasonable, and we should not risk such problems for the sake of sometimes spelling out "one". regards, tom lane
Tom Lane wrote: > That advice is, if not outright wrong, at least incredibly > short-sighted. The method breaks the instant you have any additional > values to print. For example, this ain't gonna work: > > printf (ngettext ("One file removed, containing %lu bytes", > "%d files removed, containing %lu bytes", n), > n, total_bytes); I think it should use the %2$s style specifier in that case. This should work: > printf (ngettext ("One file removed, containing %2$lu bytes", > "%d files removed, containing %lu bytes", n), > n, total_bytes); -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Isn't case I think in these two cases that using "one" is actively a bad idea. These aren't English sentences they're fragments meant to report numerical results to programmers. We don't use "two" or "three" either. If the value were just part of some full sentence where the actual value wasn't the key piece of data such as some error messages the situation might be different. -- Greg On 26 May 2009, at 15:05, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Tom Lane wrote: > >> That advice is, if not outright wrong, at least incredibly >> short-sighted. The method breaks the instant you have any additional >> values to print. For example, this ain't gonna work: >> >> printf (ngettext ("One file removed, containing %lu bytes", >> "%d files removed, containing %lu bytes", n), >> n, total_bytes); > > I think it should use the %2$s style specifier in that case. This > should work: > >> printf (ngettext ("One file removed, containing %2$lu bytes", >> "%d files removed, containing %lu bytes", n), >> n, total_bytes); > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@commandprompt.com> writes: > I think it should use the %2$s style specifier in that case. This > should work: > printf (ngettext ("One file removed, containing %2$lu bytes", > "%d files removed, containing %lu bytes", n), > n, total_bytes); How's that gonna work? In the n=1 case, printf would have no idea about the type/size of the argument it would need to skip over. I think maybe you could make it work like this: printf (ngettext ("One file removed, containing %1$lu bytes", "%2$d files removed, containing%1$lu bytes", n), total_bytes, n); but *for sure* I don't want us playing such games without a robust compile-time check on both variants of the ngettext string. I'm not real sure it's a good idea at all, because of the potential for confusing translators. Notice also that we have subtly embedded the preferred English phrase ordering here: if someone wants to pull the same type of trick in a language where the bytecount ought to come first, he's just plain out of luck. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I think it should use the %2$s style specifier in that case. This > > should work: > > > printf (ngettext ("One file removed, containing %2$lu bytes", > > "%d files removed, containing %lu bytes", n), > > n, total_bytes); > > How's that gonna work? In the n=1 case, printf would have no idea about > the type/size of the argument it would need to skip over. Hmm, I admit I have no idea how it works ... but now that I think about it, you are right that at least I only use it with the whole argument array, just in a different order. > I think maybe you could make it work like this: > > printf (ngettext ("One file removed, containing %1$lu bytes", > "%2$d files removed, containing %1$lu bytes", n), > total_bytes, n); > > but *for sure* I don't want us playing such games without a robust > compile-time check on both variants of the ngettext string. I'm > not real sure it's a good idea at all, because of the potential for > confusing translators. Notice also that we have subtly embedded the > preferred English phrase ordering here: if someone wants to pull the > same type of trick in a language where the bytecount ought to come > first, he's just plain out of luck. Agreed on both counts. We have enough trouble finding translators as it is; I don't want to know what would happen if we were to confuse them with this :-) I find it strange that this topic has not been fully hashed out in the GNU gettext documentation. Maybe we should talk to them. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
I wrote: > ... Notice also that we have subtly embedded the > preferred English phrase ordering here: if someone wants to pull the > same type of trick in a language where the bytecount ought to come > first, he's just plain out of luck. Uh, scratch that [ not enough caffeine yet ]. What this coding embeds is the assumption that the filecount is the only variable we might wish to replace with a constant string, which is safe enough since that's the only one that we know a fixed value for in any one ngettext string. Still, I agree with Greg's opinion that this is just not a real good thing to be doing. regards, tom lane
On Tuesday 26 May 2009 17:19:50 Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I think it should use the %2$s style specifier in that case. This > > should work: > > > > printf (ngettext ("One file removed, containing %2$lu bytes", > > "%d files removed, containing %lu bytes", n), > > n, total_bytes); > > How's that gonna work? In the n=1 case, printf would have no idea about > the type/size of the argument it would need to skip over. gcc -Wall actually warns if you do this.
* Alvaro Herrera <alvherre@commandprompt.com> [090526 10:06]: > Tom Lane wrote: > > > That advice is, if not outright wrong, at least incredibly > > short-sighted. The method breaks the instant you have any additional > > values to print. For example, this ain't gonna work: > > > > printf (ngettext ("One file removed, containing %lu bytes", > > "%d files removed, containing %lu bytes", n), > > n, total_bytes); > > I think it should use the %2$s style specifier in that case. This > should work: > > > printf (ngettext ("One file removed, containing %2$lu bytes", > > "%d files removed, containing %lu bytes", n), > > n, total_bytes); From the glibc printf man page: "There may be no gaps in the numbers of arguments specified using '$'; for example,if arguments 1 and 3 are specified, argument 2 must also be specified somewhere in the format string." So, is skipping 1 allowed? But, it *is* a commonly used form, especially in translations (where orders of things need to be flipped), and is already used in many of the translated PG .po files. That said, I do think the "msgid" should be using the % args, not words for a few reasons: 1) Make it more clear for translators the arguments and their ordering without having to visit the source code 2) On crufty systems without gettext, I wouldn't expect them to support m$ modifiers then either... 3) Greg's "these are numbers, not sentences" is how I expect the system to work... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Tuesday 26 May 2009 16:47:44 Tom Lane wrote: > The method breaks the instant you have any additional > values to print. For example, this ain't gonna work: > > printf (ngettext ("One file removed, containing %lu bytes", > "%d files removed, containing %lu bytes", n), > n, total_bytes); Don't do that then. This only shows that you cannot implement everything this way. It does not show why the things that you can implement are wrong. > I'm of the opinion that the test being performed by msgfmt -v is > entirely reasonable, and we should not risk such problems for the sake > of sometimes spelling out "one". I have no objections to this. I am only pointing out how we arrived at the current state.
Aidan Van Dyk <aidan@highrise.ca> writes: > From the glibc printf man page: > "There may be no gaps in the numbers of arguments specified using > '$'; for example, if arguments 1 and 3 are specified, argument 2 must > also be specified somewhere in the format string." > So, is skipping 1 allowed? No --- the point is that printf has to be able to figure out where each argument is on the stack, so it must be able to infer the size of each of the arguments from left to right. > That said, I do think the "msgid" should be using the % args, not words > for a few reasons: > 1) Make it more clear for translators the arguments and their ordering > without having to visit the source code > 2) On crufty systems without gettext, I wouldn't expect them to support m$ > modifiers then either... > 3) Greg's "these are numbers, not sentences" is how I expect the system > to work... Actually, configure checks to see if the local printf supports m$ or not, and we use our own printf implementation if not. So I'm not worried about #2. I agree with your other points though. (So, if you wanna see how this is done, try src/port/snprintf.c) regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [090526 10:56]: > Actually, configure checks to see if the local printf supports m$ or > not, and we use our own printf implementation if not. So I'm not > worried about #2. I agree with your other points though. > > (So, if you wanna see how this is done, try src/port/snprintf.c) > > regards, tom lane So what part of a working libc does PG use that it *doesn't* have to carry around in src/port/? ;-) a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Peter Eisentraut píše v út 26. 05. 2009 v 13:39 +0300: > Of course the concrete example that you show doesn't actually take advantage > of this, so if it is important to you, please send a patch to fix it. Fix attached. I found only two problems, both in psql. I did not fix .po files. Is necessary to fix them manually or do you regenerate files? thanks Zdenek
Attachment
On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote: > The problem here is (1 row) instead of (%lu row). When I run msgfmt > without -v everything works fine but I think we should fixed it (there > are more occurrences of this issue). I don't think we can find all these occurrences without the Solaris version of msgfmt. So please send a complete error log over all files, or better yet a patch.
Here is output of: for FILE in `find . -name *.po`;do LC_ALL=C msgfmt -v -o /dev/null $FILE 2>> msgfmt.txt; done Zdenek Peter Eisentraut píše v st 27. 05. 2009 v 23:08 +0300: > On Monday 25 May 2009 19:11:24 Zdenek Kotala wrote: > > The problem here is (1 row) instead of (%lu row). When I run msgfmt > > without -v everything works fine but I think we should fixed it (there > > are more occurrences of this issue). > > I don't think we can find all these occurrences without the Solaris version of > msgfmt. So please send a complete error log over all files, or better yet a > patch.
Attachment
On Wednesday 27 May 2009 23:02:19 Zdenek Kotala wrote: > Peter Eisentraut píše v út 26. 05. 2009 v 13:39 +0300: > > Of course the concrete example that you show doesn't actually take > > advantage of this, so if it is important to you, please send a patch to > > fix it. > > Fix attached. I found only two problems, both in psql. I did not fix .po > files. Is necessary to fix them manually or do you regenerate files? fixed