Thread: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello I found so we doesn't have functionality for simply text aligning - so I propose support width for %s like printf's behave. glibc implementation knows a rule for precision, that I don't would to implement, because it is oriented to bytes and not to chars - and it can be confusing. Still I would to have implementation and design of "format" function maximally simple - and a rule for "s" specifier and width is clean and simple. postgres=# select format('||%4s|| ||%-4s||', 'ab', 'ab'); format ------------------- || ab|| ||ab || I also found so our implementation of positional and ordered placeholders are not correct. -- correct postgres=# select format('%s %2$s %s', 'Hello', 'World'); format ------------------- Hello World World -- our current behave postgres=# select format('%s %2$s %s', 'Hello', 'World'); ERROR: too few arguments for format postgres=# Comments, notices? Regards Pavel Stehule
Attachment
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Stephen Frost
Date:
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > I found so we doesn't have functionality for simply text aligning - so > I propose support width for %s like printf's behave. glibc > implementation knows a rule for precision, that I don't would to > implement, because it is oriented to bytes and not to chars - and it > can be confusing. Still I would to have implementation and design of > "format" function maximally simple - and a rule for "s" specifier and > width is clean and simple. I started looking at this patch to get a head-start on the next commitfest. There's no documentation, which certainly needs to be fixed, but worse, this doesn't appear to match glibc printf and it's not entirely clear to me why it doesn't. > -- our current behave > postgres=# select format('%s %2$s %s', 'Hello', 'World'); > ERROR: too few arguments for format > postgres=# This is correct, if we're matching glibc (and SUS, I believe), isn't it? You're not allowed to mix '%2$s' type parameters and '%s' in a single format. Thanks, Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2012/12/29 Stephen Frost <sfrost@snowman.net>: > Pavel, > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> I found so we doesn't have functionality for simply text aligning - so >> I propose support width for %s like printf's behave. glibc >> implementation knows a rule for precision, that I don't would to >> implement, because it is oriented to bytes and not to chars - and it >> can be confusing. Still I would to have implementation and design of >> "format" function maximally simple - and a rule for "s" specifier and >> width is clean and simple. > > I started looking at this patch to get a head-start on the next > commitfest. There's no documentation, which certainly needs to be > fixed, but worse, this doesn't appear to match glibc printf and it's not > entirely clear to me why it doesn't. > >> -- our current behave >> postgres=# select format('%s %2$s %s', 'Hello', 'World'); >> ERROR: too few arguments for format >> postgres=# > > This is correct, if we're matching glibc (and SUS, I believe), isn't it? > You're not allowed to mix '%2$s' type parameters and '%s' in a single > format. I am not sure, please recheck pavel ~ $ cat test.c #include <stdio.h> void main() { printf("%s %2$s %s\n", "AHOJ", "Svete"); } pavel ~ $ gcc test.c # no warning here pavel ~ $ gcc --version gcc (GCC) 4.7.2 20120921 (Red Hat 4.7.2-2) Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. pavel ~ $ ./a.out AHOJ Svete Svete pavel ~ $ Regards Pavel Stehule > > Thanks, > > Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Stephen Frost
Date:
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > 2012/12/29 Stephen Frost <sfrost@snowman.net>: > > This is correct, if we're matching glibc (and SUS, I believe), isn't it? > > You're not allowed to mix '%2$s' type parameters and '%s' in a single > > format. > > I am not sure, please recheck According to the man pages on my Ubuntu system, under 'Format of the format string': ------------------- If the style using '$' is used, it must be used throughout for all conversions taking an argument and all width and precision arguments, but it may be mixed with "%%" formats which do not consume an argument. ------------------- > pavel ~ $ cat test.c > #include <stdio.h> > > void main() > { > > printf("%s %2$s %s\n", "AHOJ", "Svete"); > } > > pavel ~ $ gcc test.c # no warning here You didn't turn any on... sfrost@tamriel:/home/sfrost> gcc -o qq -Wall test.c test.c: In function ‘main’: test.c:5:3: warning: $ operand number used after format without operand number [-Wformat] Thanks, Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2012/12/29 Stephen Frost <sfrost@snowman.net>: > Pavel, > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> 2012/12/29 Stephen Frost <sfrost@snowman.net>: >> > This is correct, if we're matching glibc (and SUS, I believe), isn't it? >> > You're not allowed to mix '%2$s' type parameters and '%s' in a single >> > format. >> >> I am not sure, please recheck > > According to the man pages on my Ubuntu system, under 'Format of the > format string': > > ------------------- > If the style using '$' is used, it must be used throughout for > all conversions taking an argument and all width and precision > arguments, but it may be mixed with "%%" formats which do not consume > an argument. > ------------------- > >> pavel ~ $ cat test.c >> #include <stdio.h> >> >> void main() >> { >> >> printf("%s %2$s %s\n", "AHOJ", "Svete"); >> } >> >> pavel ~ $ gcc test.c # no warning here > > You didn't turn any on... > > sfrost@tamriel:/home/sfrost> gcc -o qq -Wall test.c > test.c: In function ‘main’: > test.c:5:3: warning: $ operand number used after format without operand number [-Wformat] ok, so what is proposed solution? I see two possibilities - a) applying my current patch - although it is not fully correct, b) new patch, that do necessary check and raise more descriptive error message. I have not strong preferences in this topic - both variants are acceptable for me and I invite any community opinion. But current state is not intuitive and should be fixed. Regards Pavel > > Thanks, > > Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Stephen Frost
Date:
* Pavel Stehule (pavel.stehule@gmail.com) wrote: > ok, so what is proposed solution? My recommendation would be to match what glibc's printf does. > I see two possibilities - a) applying my current patch - although it > is not fully correct, b) new patch, that do necessary check and raise > more descriptive error message. Right, have a new patch that does error-checking and returns a better error on that case, update the docs to reflect that restriction, and then (ideally as an additional and independent patch..) implement the width capability (and, ideally, the ability to pass the width as an argument, as glibc supports) which matches the glibc arguments. Part of the reason that this restriction is in place, I believe, is because glibc expects the width to come before any explicit argument being passed and if an explicit argument is used for width then an explicit argument has to be used for the value also, otherwise it wouldn't be clear from the format which was the argument number and which was the explicit width size. I don't think it's a good idea to come up with our own format definition, particularly one which looks so similar to the well-known printf() format. > I have not strong preferences in this topic - both variants are > acceptable for me and I invite any community opinion. But current > state is not intuitive and should be fixed. Agreed. Thanks, Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello Stephen 2012/12/29 Stephen Frost <sfrost@snowman.net>: > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> ok, so what is proposed solution? > > My recommendation would be to match what glibc's printf does. > >> I see two possibilities - a) applying my current patch - although it >> is not fully correct, b) new patch, that do necessary check and raise >> more descriptive error message. > > Right, have a new patch that does error-checking and returns a better > error on that case, update the docs to reflect that restriction, and > then (ideally as an additional and independent patch..) implement the > width capability (and, ideally, the ability to pass the width as an > argument, as glibc supports) which matches the glibc arguments. > > Part of the reason that this restriction is in place, I believe, is > because glibc expects the width to come before any explicit argument > being passed and if an explicit argument is used for width then an > explicit argument has to be used for the value also, otherwise it > wouldn't be clear from the format which was the argument number and > which was the explicit width size. I found one issue - if I disallow mixing positional and ordered style I break compatibility with previous implementation. so maybe third way is better - use fix from my patch - a behave is same like in glibc - and raise warning (instead errors) when mixing styles is detected - we can replace warnings by errors in future. What do you think? Regards Pavel > > I don't think it's a good idea to come up with our own format > definition, particularly one which looks so similar to the well-known > printf() format. > >> I have not strong preferences in this topic - both variants are >> acceptable for me and I invite any community opinion. But current >> state is not intuitive and should be fixed. > > Agreed. > > Thanks, > > Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2012/12/30 Pavel Stehule <pavel.stehule@gmail.com>: > Hello Stephen > > 2012/12/29 Stephen Frost <sfrost@snowman.net>: >> * Pavel Stehule (pavel.stehule@gmail.com) wrote: >>> ok, so what is proposed solution? >> >> My recommendation would be to match what glibc's printf does. >> >>> I see two possibilities - a) applying my current patch - although it >>> is not fully correct, b) new patch, that do necessary check and raise >>> more descriptive error message. >> >> Right, have a new patch that does error-checking and returns a better >> error on that case, update the docs to reflect that restriction, and >> then (ideally as an additional and independent patch..) implement the >> width capability (and, ideally, the ability to pass the width as an >> argument, as glibc supports) which matches the glibc arguments. >> >> Part of the reason that this restriction is in place, I believe, is >> because glibc expects the width to come before any explicit argument >> being passed and if an explicit argument is used for width then an >> explicit argument has to be used for the value also, otherwise it >> wouldn't be clear from the format which was the argument number and >> which was the explicit width size. > > I found one issue - if I disallow mixing positional and ordered style > I break compatibility with previous implementation. > > so maybe third way is better - use fix from my patch - a behave is > same like in glibc - and raise warning (instead errors) when mixing > styles is detected - we can replace warnings by errors in future. this is exactly what gcc does - and without breaking applications. > > What do you think? > > Regards > > Pavel >> >> I don't think it's a good idea to come up with our own format >> definition, particularly one which looks so similar to the well-known >> printf() format. >> >>> I have not strong preferences in this topic - both variants are >>> acceptable for me and I invite any community opinion. But current >>> state is not intuitive and should be fixed. >> >> Agreed. >> >> Thanks, >> >> Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Stephen Frost
Date:
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > I found one issue - if I disallow mixing positional and ordered style > I break compatibility with previous implementation. Can you elaborate? In the previous example, an error was returned when mixing (not a terribly good one, but still an error). Returning a better error won't be a problem. > so maybe third way is better - use fix from my patch - a behave is > same like in glibc - and raise warning (instead errors) when mixing > styles is detected - we can replace warnings by errors in future. > > What do you think? If there are cases which work today then I agree that we should issue a warning to avoid breaking existing applications. We should still use the glibc format when adding width support, however. Thanks, Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2012/12/31 Stephen Frost <sfrost@snowman.net>: > Pavel, > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> I found one issue - if I disallow mixing positional and ordered style >> I break compatibility with previous implementation. > > Can you elaborate? In the previous example, an error was returned when > mixing (not a terribly good one, but still an error). Returning a > better error won't be a problem. A result from ours previous talk was a completely disabling mixing positional and ordered placeholders - like is requested by man and gcc raises warnings there. But mixing is not explicitly disallowed in doc, and mixing was tested in our regress tests. There are tests where placeholders are mixed - so anybody can use it. select format('Hello %s %1$s %s', 'World', 'Hello again'); -- is enabled and supported and result is expected -- but this raises error - and it is same situation like previous example select format('%s %2$s %s', 'Hello', 'World'); -- so bot examples should be executed or should be disabled if this functionality should be consistent. And I can't to break first example, then I have to repair second example Regards > >> so maybe third way is better - use fix from my patch - a behave is >> same like in glibc - and raise warning (instead errors) when mixing >> styles is detected - we can replace warnings by errors in future. >> >> What do you think? > > If there are cases which work today then I agree that we should issue a > warning to avoid breaking existing applications. We should still use > the glibc format when adding width support, however. > > Thanks, > > Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Stephen Frost
Date:
Pavel, * Pavel Stehule (pavel.stehule@gmail.com) wrote: > A result from ours previous talk was a completely disabling mixing > positional and ordered placeholders - like is requested by man and gcc > raises warnings there. > > But mixing is not explicitly disallowed in doc, and mixing was tested > in our regress tests. There are tests where placeholders are mixed - > so anybody can use it. > select format('Hello %s %1$s %s', 'World', 'Hello again'); -- is > enabled and supported and result is expected Alright, then I agree that raising a warning in that case makes sense and let's update the docs to reflect that it shouldn't be done (like what glibc/gcc do). Thanks, Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2012/12/31 Stephen Frost <sfrost@snowman.net>: > Pavel, > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> A result from ours previous talk was a completely disabling mixing >> positional and ordered placeholders - like is requested by man and gcc >> raises warnings there. >> >> But mixing is not explicitly disallowed in doc, and mixing was tested >> in our regress tests. There are tests where placeholders are mixed - >> so anybody can use it. >> select format('Hello %s %1$s %s', 'World', 'Hello again'); -- is >> enabled and supported and result is expected > > Alright, then I agree that raising a warning in that case makes sense > and let's update the docs to reflect that it shouldn't be done (like > what glibc/gcc do). ok, I prepare patch Regards Pavel > > Thanks, > > Stephen
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2012/12/31 Stephen Frost <sfrost@snowman.net>: > Pavel, > > * Pavel Stehule (pavel.stehule@gmail.com) wrote: >> A result from ours previous talk was a completely disabling mixing >> positional and ordered placeholders - like is requested by man and gcc >> raises warnings there. >> >> But mixing is not explicitly disallowed in doc, and mixing was tested >> in our regress tests. There are tests where placeholders are mixed - >> so anybody can use it. >> select format('Hello %s %1$s %s', 'World', 'Hello again'); -- is >> enabled and supported and result is expected > > Alright, then I agree that raising a warning in that case makes sense > and let's update the docs to reflect that it shouldn't be done (like > what glibc/gcc do). so there are two patches - first is fix in logic when positional and ordered parameters are mixed + add warning in this situation. Second patch enables possibility to specify width for %s conversion. I didn't finalize documentation due my net good English skills - probably there is necessary new paragraph about function "format" elsewhere than in table Regards Pavel > > Thanks, > > Stephen
Attachment
Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2012/12/31 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > 2012/12/31 Stephen Frost <sfrost@snowman.net>: >> Pavel, >> >> * Pavel Stehule (pavel.stehule@gmail.com) wrote: >>> A result from ours previous talk was a completely disabling mixing >>> positional and ordered placeholders - like is requested by man and gcc >>> raises warnings there. >>> >>> But mixing is not explicitly disallowed in doc, and mixing was tested >>> in our regress tests. There are tests where placeholders are mixed - >>> so anybody can use it. >>> select format('Hello %s %1$s %s', 'World', 'Hello again'); -- is >>> enabled and supported and result is expected >> >> Alright, then I agree that raising a warning in that case makes sense >> and let's update the docs to reflect that it shouldn't be done (like >> what glibc/gcc do). > > so there are two patches - first is fix in logic when positional and > ordered parameters are mixed + add warning in this situation. Second > patch enables possibility to specify width for %s conversion. > > I didn't finalize documentation due my net good English skills - > probably there is necessary new paragraph about function "format" > elsewhere than in table > > Regards > > Pavel updated patches due changes for better variadic "any" function. apply fix_mixing_positinal_ordered_placeholders_warnings_20130126.patch first Regards Pavel
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 26 January 2013 10:58, Pavel Stehule <pavel.stehule@gmail.com> wrote: > updated patches due changes for better variadic "any" function. > > apply fix_mixing_positinal_ordered_placeholders_warnings_20130126.patch first > Hi, No one is listed as a reviewer for this patch so I thought I would take a look at it, since it looks like a useful enhancement to format(). Starting with the first patch - it issues a new WARNING if the format string contains a mixture of format specifiers with and without parameter indexes (e.g., 'Hello %s, %1$s'). Having thought about it a bit, I really don't like this for a number of reasons: * I actually quite like the current behaviour. Admittedly putting ordered specifiers (like '%s') after positional ones (like '%3$s') is probably not so useful, and potentially open to different interpretations. But putting positional specifiers at the end is completely unambiguous and can save a lot of typing (e.g., '%s,%s,%s,%s,%,s,%s,%s,%1$s'). * On backwards compatibility grounds. The fact that the only example of format() in the manual is precisely a case of mixed positional and ordered parameters makes it quite likely that people will have used this feature already. * Part of the justification for adding the warning is for compatibility with glibc/SUS printf(). But if we are aiming for that, then we should also produce a warning if positional parameters are used and not all parameters are consumed. That would be a pain to implement and I don't think it would be particularly helpful in practice. Here is what the SUS says: """ The format can contain either numbered argument specifications (that is, %n$ and *m$), or unnumbered argument specifications (that is, % and *), but normally not both. The only exception to this is that %% can be mixed with the %n$ form. The results of mixing numbered and unnumbered argument specifications in a format string are undefined. When numbered argument specifications are used, specifying the Nth argument requires that all the leading arguments, from the first to the (N-1)th, are specified in the format string. """ I think that if we are going to do anything, we should explicitly document our current behaviour as a PostgreSQL extension to the SUS printf(), describing how we handle mixed parameters, rather than adding this warning. The current PostgreSQL code isn't inconsistent with the SUS, except in the error case, and so can reasonably be regarded as an extension. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 26 January 2013 10:58, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> updated patches due changes for better variadic "any" function. >> >> apply fix_mixing_positinal_ordered_placeholders_warnings_20130126.patch first >> > > Hi, > > No one is listed as a reviewer for this patch so I thought I would > take a look at it, since it looks like a useful enhancement to > format(). > > Starting with the first patch - it issues a new WARNING if the format > string contains a mixture of format specifiers with and without > parameter indexes (e.g., 'Hello %s, %1$s'). > > Having thought about it a bit, I really don't like this for a number of reasons: > > * I actually quite like the current behaviour. Admittedly putting > ordered specifiers (like '%s') after positional ones (like '%3$s') is > probably not so useful, and potentially open to different > interpretations. But putting positional specifiers at the end is > completely unambiguous and can save a lot of typing (e.g., > '%s,%s,%s,%s,%,s,%s,%s,%1$s'). > > * On backwards compatibility grounds. The fact that the only example > of format() in the manual is precisely a case of mixed positional and > ordered parameters makes it quite likely that people will have used > this feature already. > > * Part of the justification for adding the warning is for > compatibility with glibc/SUS printf(). But if we are aiming for that, > then we should also produce a warning if positional parameters are > used and not all parameters are consumed. That would be a pain to > implement and I don't think it would be particularly helpful in > practice. Here is what the SUS says: > > """ > The format can contain either numbered argument specifications (that > is, %n$ and *m$), or unnumbered argument specifications (that is, % > and *), but normally not both. The only exception to this is that %% > can be mixed with the %n$ form. The results of mixing numbered and > unnumbered argument specifications in a format string are undefined. > When numbered argument specifications are used, specifying the Nth > argument requires that all the leading arguments, from the first to > the (N-1)th, are specified in the format string. > """ > > I think that if we are going to do anything, we should explicitly > document our current behaviour as a PostgreSQL extension to the SUS > printf(), describing how we handle mixed parameters, rather than > adding this warning. > > The current PostgreSQL code isn't inconsistent with the SUS, except in > the error case, and so can reasonably be regarded as an extension. > I am not sure what you dislike? warnings or redesign of behave. I can live without warnings, when this field will be documented - it is not fully compatible with gcc, but gcc just raises warnings and does correct implementation. Our warnings are on different level than gcc warnings. But I don't think so current implementation is correct -- our current behave postgres=# select format('%s %2$s %s', 'Hello', 'World'); ERROR: too few arguments for format postgres=# postgres=# select format('%s %1$s %s', 'Hello', 'World'); -- works ordered parameters should be independent on positional parameters. And this behave has glibc Regards Pavel > Regards, > Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes: > 2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: >> Starting with the first patch - it issues a new WARNING if the format >> string contains a mixture of format specifiers with and without >> parameter indexes (e.g., 'Hello %s, %1$s'). >> >> Having thought about it a bit, I really don't like this for a number of reasons: > I am not sure what you dislike? > warnings or redesign of behave. Both. If we had done this when we first implemented format(), it'd be fine, but it's too late to change it now. There very likely are applications out there that depend on the current behavior. As Dean says, it's not incompatible with SUS, just a superset, so ISTM this patch is proposing to remove documented functionality --- for no very strong reason. I vote for rejecting this change entirely. regards, tom lane
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Both. If we had done this when we first implemented format(), it'd be > fine, but it's too late to change it now. There very likely are > applications out there that depend on the current behavior. As Dean > says, it's not incompatible with SUS, just a superset, so ISTM this > patch is proposing to remove documented functionality --- for no very > strong reason. It's only a "superset" of the very poor subset of printf()-like functionality that we currently support through the format() function. If we can actually match glibc/SUS (which I don't believe the initial patch did..) and support a mix of explicitly specified arguments and implicit arguments, along with the various width, precision, and other format specifications, then fine by me. I'm not convinced that's actually possible due to the ambiguity which will certainly arise and I'm quite sure the documentation that explains what we do in each case will deserve it's own chapter. Thanks, Stephen
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 28 January 2013 17:32, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Both. If we had done this when we first implemented format(), it'd be >> fine, but it's too late to change it now. There very likely are >> applications out there that depend on the current behavior. As Dean >> says, it's not incompatible with SUS, just a superset, so ISTM this >> patch is proposing to remove documented functionality --- for no very >> strong reason. > > It's only a "superset" of the very poor subset of printf()-like > functionality that we currently support through the format() function. > > If we can actually match glibc/SUS (which I don't believe the initial > patch did..) and support a mix of explicitly specified arguments and > implicit arguments, along with the various width, precision, and other > format specifications, then fine by me. > > I'm not convinced that's actually possible due to the ambiguity which > will certainly arise and I'm quite sure the documentation that explains > what we do in each case will deserve it's own chapter. > There are a number of separate issues here, but I don't see this as an intractable problem. In general a format specifier looks like: %[parameter][flags][width][.precision][length]type parameter - an optional n$. This is where we have implemented a superset of the SUS printf(). But I think it is a useful superset, and it's too late to remove it now. Any ambiguity lies here, where we go beyond the SUS - some printf() implementations appear to do something different (apparently without documenting what they do). I think our documentation could be clearer here, to explain how mixed parameters are handled. flags - not currently implemented. Pavel's second patch adds support for the '-' flag for left justified string output. However, I think this should support all datatypes (i.e., %I and %L as well as %s). width - not currently implemented. Pavel's second patch adds support for this, but note that for full compatibility with the SUS it needs to also support widths specified using * and *n$. Also, I think it should support all supported datatypes, not just strings. precision - only relevant to numeric datatypes, which we don't support. length - only relevant to numeric datatypes, which we don't support. type - this is where we only support a small subset of the SUS (plus a couple of SQL-specific types). I'm not sure if anyone has any plans to extend this, but that's certainly not on the cards for 9.3. So the relevant pieces that Pavel's second patch is attempting to add support for are the '-' flag and the width field. As noted above, there are a couple of areas where the current patch falls short of the SUS: 1). The '-' flag and width field are supposed to apply to all types. I think that not supporting %I and %L will be somewhat limiting, and goes against the intent of the SUS, even though those types are PostgreSQL extensions. 2). The width field is supposed to support * (width specified by the next function argument) and *n$ (width specified by the nth function argument). If we supported this, then we could claim full compatibility with the SUS in all fields except for the type support, which would seem like a real step forward. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 28 January 2013 17:32, Stephen Frost <sfrost@snowman.net> wrote: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> Both. If we had done this when we first implemented format(), it'd be >>> fine, but it's too late to change it now. There very likely are >>> applications out there that depend on the current behavior. As Dean >>> says, it's not incompatible with SUS, just a superset, so ISTM this >>> patch is proposing to remove documented functionality --- for no very >>> strong reason. >> >> It's only a "superset" of the very poor subset of printf()-like >> functionality that we currently support through the format() function. >> >> If we can actually match glibc/SUS (which I don't believe the initial >> patch did..) and support a mix of explicitly specified arguments and >> implicit arguments, along with the various width, precision, and other >> format specifications, then fine by me. >> >> I'm not convinced that's actually possible due to the ambiguity which >> will certainly arise and I'm quite sure the documentation that explains >> what we do in each case will deserve it's own chapter. >> > > There are a number of separate issues here, but I don't see this as an > intractable problem. In general a format specifier looks like: > > %[parameter][flags][width][.precision][length]type > > parameter - an optional n$. This is where we have implemented a > superset of the SUS printf(). But I think it is a useful superset, and > it's too late to remove it now. Any ambiguity lies here, where we go > beyond the SUS - some printf() implementations appear to do something > different (apparently without documenting what they do). I think our > documentation could be clearer here, to explain how mixed parameters > are handled. > > flags - not currently implemented. Pavel's second patch adds support > for the '-' flag for left justified string output. However, I think > this should support all datatypes (i.e., %I and %L as well as %s). no - surely not - I% and L% is PostgreSQL extension and left or right alignment is has no sense for PostgreSQL identifiers and PostgreSQL literals. Regards Pavel > > width - not currently implemented. Pavel's second patch adds support > for this, but note that for full compatibility with the SUS it needs > to also support widths specified using * and *n$. Also, I think it > should support all supported datatypes, not just strings. > > precision - only relevant to numeric datatypes, which we don't support. > > length - only relevant to numeric datatypes, which we don't support. > > type - this is where we only support a small subset of the SUS (plus a > couple of SQL-specific types). I'm not sure if anyone has any plans to > extend this, but that's certainly not on the cards for 9.3. > > > So the relevant pieces that Pavel's second patch is attempting to add > support for are the '-' flag and the width field. As noted above, > there are a couple of areas where the current patch falls short of the > SUS: > > 1). The '-' flag and width field are supposed to apply to all types. I > think that not supporting %I and %L will be somewhat limiting, and > goes against the intent of the SUS, even though those types are > PostgreSQL extensions. > > 2). The width field is supposed to support * (width specified by the > next function argument) and *n$ (width specified by the nth function > argument). If we supported this, then we could claim full > compatibility with the SUS in all fields except for the type support, > which would seem like a real step forward. > > Regards, > Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 28 January 2013 20:40, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: >> On 28 January 2013 17:32, Stephen Frost <sfrost@snowman.net> wrote: >>> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>>> Both. If we had done this when we first implemented format(), it'd be >>>> fine, but it's too late to change it now. There very likely are >>>> applications out there that depend on the current behavior. As Dean >>>> says, it's not incompatible with SUS, just a superset, so ISTM this >>>> patch is proposing to remove documented functionality --- for no very >>>> strong reason. >>> >>> It's only a "superset" of the very poor subset of printf()-like >>> functionality that we currently support through the format() function. >>> >>> If we can actually match glibc/SUS (which I don't believe the initial >>> patch did..) and support a mix of explicitly specified arguments and >>> implicit arguments, along with the various width, precision, and other >>> format specifications, then fine by me. >>> >>> I'm not convinced that's actually possible due to the ambiguity which >>> will certainly arise and I'm quite sure the documentation that explains >>> what we do in each case will deserve it's own chapter. >>> >> >> There are a number of separate issues here, but I don't see this as an >> intractable problem. In general a format specifier looks like: >> >> %[parameter][flags][width][.precision][length]type >> >> parameter - an optional n$. This is where we have implemented a >> superset of the SUS printf(). But I think it is a useful superset, and >> it's too late to remove it now. Any ambiguity lies here, where we go >> beyond the SUS - some printf() implementations appear to do something >> different (apparently without documenting what they do). I think our >> documentation could be clearer here, to explain how mixed parameters >> are handled. >> >> flags - not currently implemented. Pavel's second patch adds support >> for the '-' flag for left justified string output. However, I think >> this should support all datatypes (i.e., %I and %L as well as %s). > > no - surely not - I% and L% is PostgreSQL extension and left or right > alignment is has no sense for PostgreSQL identifiers and PostgreSQL > literals. Left/right alignment and padding in printf() apply to all types, after the data value is converted to a string. Why shouldn't that same principle apply to %I and %L? The obvious use-case is for producing tabular output of data with columns neatly aligned. If we don't support %I and %L then any alignment of columns to the right is lost. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Tom Lane
Date:
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On 28 January 2013 20:40, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: >>> flags - not currently implemented. Pavel's second patch adds support >>> for the '-' flag for left justified string output. However, I think >>> this should support all datatypes (i.e., %I and %L as well as %s). >> no - surely not - I% and L% is PostgreSQL extension and left or right >> alignment is has no sense for PostgreSQL identifiers and PostgreSQL >> literals. > Left/right alignment and padding in printf() apply to all types, after > the data value is converted to a string. Why shouldn't that same > principle apply to %I and %L? I agree with Dean --- it would be very strange for these features not to apply to all conversion specifiers (excepting %% of course, which isn't really a conversion specifier but an escaping hack). regards, tom lane
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 28 January 2013 20:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > In general a format specifier looks like: > > %[parameter][flags][width][.precision][length]type > This highlights another problem with the current implementation --- the '-' flag and the width field need to be parsed separately. So '%-3s' should be parsed as a '-' flag followed by a width of 3, not as a width of -3, which is then interpreted as left-aligned. This might seem like nitpicking, but actually it is important: * In the future we might support more flags, and they can be specified in any order, so the '-' flag won't necessarily come immediately before the width. * The width field is optional, even if the '-' flag is specified. So '%-s' is perfectly legal and should be interpreted as '%s'. The current implementation treats it as a width of 0, which is wrong. * The width field might not be a number, it might be something like * or *3$. Note that the SUS allows a negative width to be passed in as a function argument using this syntax, in which case it should be treated as if the '-' flag were specified. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 29 January 2013 08:19, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > * The width field is optional, even if the '-' flag is specified. So > '%-s' is perfectly legal and should be interpreted as '%s'. The > current implementation treats it as a width of 0, which is wrong. > Oh, but of course a width of 0 is the same as no width at all, so the current code is correct after all. That's what happens if I try to write emails before I've had my caffeine :-) I think my other points remain valid though. It would still be neater to parse the flags separately from the width field, and then all literal numbers that appear in the format should be positive. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/1/28 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> 2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: >>> Starting with the first patch - it issues a new WARNING if the format >>> string contains a mixture of format specifiers with and without >>> parameter indexes (e.g., 'Hello %s, %1$s'). >>> >>> Having thought about it a bit, I really don't like this for a number of reasons: > >> I am not sure what you dislike? >> warnings or redesign of behave. > > Both. If we had done this when we first implemented format(), it'd be > fine, but it's too late to change it now. There very likely are > applications out there that depend on the current behavior. As Dean > says, it's not incompatible with SUS, just a superset, so ISTM this > patch is proposing to remove documented functionality --- for no very > strong reason. I disagree - but I have not a arguments. I am thinking so current implementation is wrong, and now is last time when we can to fix it - format() function is not too old and there is relative chance to minimal impact to users. I didn't propose removing this functionality, but fixing via more logical independent counter for ordered arguments. Dependency on previous positional argument is unpractical and unclean. I am not satisfied so it is undefined and then it is ok. Regards Pavel > > I vote for rejecting this change entirely. > > regards, tom lane
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/1/28 Tom Lane <tgl@sss.pgh.pa.us>: > Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> On 28 January 2013 20:40, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>> 2013/1/28 Dean Rasheed <dean.a.rasheed@gmail.com>: >>>> flags - not currently implemented. Pavel's second patch adds support >>>> for the '-' flag for left justified string output. However, I think >>>> this should support all datatypes (i.e., %I and %L as well as %s). > >>> no - surely not - I% and L% is PostgreSQL extension and left or right >>> alignment is has no sense for PostgreSQL identifiers and PostgreSQL >>> literals. > >> Left/right alignment and padding in printf() apply to all types, after >> the data value is converted to a string. Why shouldn't that same >> principle apply to %I and %L? > > I agree with Dean --- it would be very strange for these features not to > apply to all conversion specifiers (excepting %% of course, which isn't > really a conversion specifier but an escaping hack). ok - I have no problem with it - after some thinking - just remove one check. Regards Pavel > > regards, tom lane
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/1/29 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 28 January 2013 20:32, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> In general a format specifier looks like: >> >> %[parameter][flags][width][.precision][length]type >> > > This highlights another problem with the current implementation --- > the '-' flag and the width field need to be parsed separately. So > '%-3s' should be parsed as a '-' flag followed by a width of 3, not as > a width of -3, which is then interpreted as left-aligned. This might > seem like nitpicking, but actually it is important: > > * In the future we might support more flags, and they can be specified > in any order, so the '-' flag won't necessarily come immediately > before the width. > > * The width field is optional, even if the '-' flag is specified. So > '%-s' is perfectly legal and should be interpreted as '%s'. The > current implementation treats it as a width of 0, which is wrong. > > * The width field might not be a number, it might be something like * > or *3$. Note that the SUS allows a negative width to be passed in as a > function argument using this syntax, in which case it should be > treated as if the '-' flag were specified. A possibility to specify width as * can be implemented in future. The format() function was not designed to be fully compatible with SUS - it is simplified subset with pg enhancing. There was a talks about integration to_char() formats to format() and we didn't block it - and it was reason why I proposed and pushed name "format" and not "printf", because there can be little bit different purposes than generic printf function. Regards Pavel > > Regards, > Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2013/1/29 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 29 January 2013 08:19, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >> * The width field is optional, even if the '-' flag is specified. So >> '%-s' is perfectly legal and should be interpreted as '%s'. The >> current implementation treats it as a width of 0, which is wrong. >> > > Oh, but of course a width of 0 is the same as no width at all, so the > current code is correct after all. That's what happens if I try to > write emails before I've had my caffeine :-) > > I think my other points remain valid though. It would still be neater > to parse the flags separately from the width field, and then all > literal numbers that appear in the format should be positive. I am sending rewritten code It indirect width "*" and "*n$" is supported. It needs little bit more code. There are a new question what should be result of format(">>%2$*1$s<<", NULL, "hello") ??? raise exception now, but I am able to modify to some agreement Regards Pavel > > Regards, > Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello minor update - fix align NULL for %L Regards Pavel 2013/1/31 Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > 2013/1/29 Dean Rasheed <dean.a.rasheed@gmail.com>: >> On 29 January 2013 08:19, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: >>> * The width field is optional, even if the '-' flag is specified. So >>> '%-s' is perfectly legal and should be interpreted as '%s'. The >>> current implementation treats it as a width of 0, which is wrong. >>> >> >> Oh, but of course a width of 0 is the same as no width at all, so the >> current code is correct after all. That's what happens if I try to >> write emails before I've had my caffeine :-) >> >> I think my other points remain valid though. It would still be neater >> to parse the flags separately from the width field, and then all >> literal numbers that appear in the format should be positive. > > I am sending rewritten code > > It indirect width "*" and "*n$" is supported. It needs little bit more code. > > There are a new question > > what should be result of > > format(">>%2$*1$s<<", NULL, "hello") > > ??? > > raise exception now, but I am able to modify to some agreement > > Regards > > Pavel > > > > > >> >> Regards, >> Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
> 2013/1/31 Pavel Stehule <pavel.stehule@gmail.com>: >> I am sending rewritten code Nice. I think this will be very useful, and it looks like it now supports everything that printf() does for %s format specifiers, and it's good that %I and %L behave the same. Also the code is looking cleaner. >> It indirect width "*" and "*n$" is supported. It needs little bit more code. >> >> There are a new question >> >> what should be result of >> >> format(">>%2$*1$s<<", NULL, "hello") >> >> ??? My first thought is that a NULL width should be treated the same as no width at all (equivalent to width=0), rather than raising an exception. > minor update - fix align NULL for %L You need to do the same for a NULL value with %s, which currently produces an empty string regardless of the width. The documentation also needs to be updated. I'm thinking perhaps format() should now have its own separate sub-section in the manual, rather than trying to cram it's docs into a single table row. I can help with the docs if you like. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/2/9 Dean Rasheed <dean.a.rasheed@gmail.com>: >> 2013/1/31 Pavel Stehule <pavel.stehule@gmail.com>: >>> I am sending rewritten code > > Nice. I think this will be very useful, and it looks like it now > supports everything that printf() does for %s format specifiers, and > it's good that %I and %L behave the same. Also the code is looking > cleaner. > >>> It indirect width "*" and "*n$" is supported. It needs little bit more code. >>> >>> There are a new question >>> >>> what should be result of >>> >>> format(">>%2$*1$s<<", NULL, "hello") >>> >>> ??? > > My first thought is that a NULL width should be treated the same as no > width at all (equivalent to width=0), rather than raising an > exception. > >> minor update - fix align NULL for %L > > You need to do the same for a NULL value with %s, which currently > produces an empty string regardless of the width. have others same opinion? Usually I don't like hide NULLs, but this is corner case (and specific function) and I have not strong opinion on this issue. > > The documentation also needs to be updated. I'm thinking perhaps > format() should now have its own separate sub-section in the manual, > rather than trying to cram it's docs into a single table row. I can > help with the docs if you like. please, if you can, write it. I am sure, so you do it significantly better than me. Thank you Pavel > > Regards, > Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 9 February 2013 18:30, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>> There are a new question >>>> >>>> what should be result of >>>> >>>> format(">>%2$*1$s<<", NULL, "hello") >>>> >>>> ??? >> >> My first thought is that a NULL width should be treated the same as no >> width at all (equivalent to width=0), rather than raising an >> exception. >> >>> minor update - fix align NULL for %L >> >> You need to do the same for a NULL value with %s, which currently >> produces an empty string regardless of the width. > > have others same opinion? Usually I don't like hide NULLs, but this is > corner case (and specific function) and I have not strong opinion on > this issue. > One use case for this might be something like SELECT format('%*s', minimum_width, value) FROM some_table; Throwing an exception when minimum_width is NULL doesn't seem particularly useful. Intuitively, it just means that row has no minimum width, so I think we should allow it. I think the case where the value is NULL is much more clear-cut. format('%s') produces '' (empty string). So format('%3s') should produce ' '. >> >> The documentation also needs to be updated. I'm thinking perhaps >> format() should now have its own separate sub-section in the manual, >> rather than trying to cram it's docs into a single table row. I can >> help with the docs if you like. > > please, if you can, write it. I am sure, so you do it significantly > better than me. > Here is my first draft. I've also attached the generated HTML page, because it's not so easy to read an SGML patch. Regards, Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/2/10 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 9 February 2013 18:30, Pavel Stehule <pavel.stehule@gmail.com> wrote: >>>>> There are a new question >>>>> >>>>> what should be result of >>>>> >>>>> format(">>%2$*1$s<<", NULL, "hello") >>>>> >>>>> ??? >>> >>> My first thought is that a NULL width should be treated the same as no >>> width at all (equivalent to width=0), rather than raising an >>> exception. >>> >>>> minor update - fix align NULL for %L >>> >>> You need to do the same for a NULL value with %s, which currently >>> produces an empty string regardless of the width. >> >> have others same opinion? Usually I don't like hide NULLs, but this is >> corner case (and specific function) and I have not strong opinion on >> this issue. >> > > One use case for this might be something like > > SELECT format('%*s', minimum_width, value) FROM some_table; > > Throwing an exception when minimum_width is NULL doesn't seem > particularly useful. Intuitively, it just means that row has no > minimum width, so I think we should allow it. > > I think the case where the value is NULL is much more clear-cut. > format('%s') produces '' (empty string). So format('%3s') should > produce ' '. > ok - in this case I can accept NULL as "ignore width" > >>> >>> The documentation also needs to be updated. I'm thinking perhaps >>> format() should now have its own separate sub-section in the manual, >>> rather than trying to cram it's docs into a single table row. I can >>> help with the docs if you like. >> >> please, if you can, write it. I am sure, so you do it significantly >> better than me. >> > > Here is my first draft. I've also attached the generated HTML page, > because it's not so easy to read an SGML patch. > nice I have only one point - I am think, so format function should be in table 9-6 - some small text with reference to special section. Regards Pavel > Regards, > Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 10 February 2013 12:37, Pavel Stehule <pavel.stehule@gmail.com> >> Here is my first draft. I've also attached the generated HTML page, >> because it's not so easy to read an SGML patch. >> > > nice > > I have only one point - I am think, so format function should be in > table 9-6 - some small text with reference to special section. > It is already there in table 9-6, referring to the new section. Here is a minor update though -- I changed the name of the first optional argument from "str" to "formatarg", since they are no longer necessarily strings. Regards, Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello updated patch * merged Dean's doc * allow NULL as width Regards Pavel 2013/2/11 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 10 February 2013 12:37, Pavel Stehule <pavel.stehule@gmail.com> >> > Here is my first draft. I've also attached the generated HTML page, >>> because it's not so easy to read an SGML patch. >>> >> >> nice >> >> I have only one point - I am think, so format function should be in >> table 9-6 - some small text with reference to special section. >> > > It is already there in table 9-6, referring to the new section. > > Here is a minor update though -- I changed the name of the first > optional argument from "str" to "formatarg", since they are no longer > necessarily strings. > > Regards, > Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 11 February 2013 14:29, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Hello > > updated patch > > * merged Dean's doc > * allow NULL as width > Hi, I have not had time to look at this properly, but it doesn't look as though you have fixed the other problem I mentioned up-thread, with %s for NULL values: SELECT format('|%s|', NULL); Result: || SELECT format('|%5s|', NULL); Result: || In the second case, I think it should produce | |. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2013/2/13 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 11 February 2013 14:29, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Hello >> >> updated patch >> >> * merged Dean's doc >> * allow NULL as width >> > > Hi, > I have not had time to look at this properly, but it doesn't look as > though you have fixed the other problem I mentioned up-thread, with %s > for NULL values: > > SELECT format('|%s|', NULL); > Result: || > SELECT format('|%5s|', NULL); > Result: || > > In the second case, I think it should produce | |. fixed Regards Pavel Stehule > > Regards, > Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Kyotaro HORIGUCHI
Date:
Hello, Could you let me review this patch? > >> * merged Dean's doc > >> * allow NULL as width I understand that this patch aims pure expansion of format's current behavior and to mimic the printf in SUS glibc (*1). (*1) http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html This patch seems to preserve the behaviors of current implement. And also succeeds in mimicking almost of SUS without very subtle difference. Attached is the new patch which I've edited following the comments below and some fixed of typos, and added a few regtests. If you have no problem with this, I'll send this to committer. What do you think of this? My comments are below, ====================================== Following is a comment about the behavior. - The minus('-') is a flag, not a sign nor a operator. So this seems permitted to appear more than one time. For example, printf(">>%-------10s<<", "hoge") yields the output ">>hoge______<<" safely. This is consistent with the behavior when negative value is supplied to '-*' conversion. Followings are some comments about coding, in text_format_parse_digits, - is_valid seems to be the primary return value so returning this as function's return value should make the caller more simple. - Although the compiler should deal properly with that, I don't think it proper to use the memory pointed by function parameters as local working storage. *inum and *is_valid in the while loop should be replaced with local variablesand set them after the values are settled. for TEXT_FORMAT_NEXT_CHAR, - This macro name sounds somewhat confusing and this could be used also in text_format_parse_digits. I propose FORWARD_PARSE_POINTinstead. Also I removed end_ptr from macro parameters but I'm not sure of the pertinence of that. for text_format_parse_format: - Using start_ptr as a working pointer makes the name inappropriate. - Out parameters seems somewhat redundant. indirect_width and indirect_width_parameter could be merged using 0 to indicate unnumbered. for text_format: - maximum number of function argument limited to FUNC_MAX_ARGS (100), so no need to care of wrap around of argument index,I suppose. - Something seems confusing at the lines follow | /* Not enough arguments? Deduct 1 to avoid counting format string. */ | if (arg > nargs - 1) This expression does not have so special meaning. The maximum index in an zero-based array should not be equal to orlarger than the number of the elements of it. If that's not your intent, some rewrite would be needed.. - Only int4 is directly read for width value in the latest patch, but int2 can also be directly readable and it should be needed. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Kyotaro HORIGUCHI
Date:
Umm. sorry, > If you have no problem with this, I'll send this to committer. I just found that this patch already has a revewer. I've seen only Status field in patch list.. Should I leave this to you, Dean? -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello I have no objections, Thank you for update Regards Pavel 2013/2/28 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>: > Hello, Could you let me review this patch? > >> >> * merged Dean's doc >> >> * allow NULL as width > > I understand that this patch aims pure expansion of format's > current behavior and to mimic the printf in SUS glibc (*1). > > (*1) http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html > > This patch seems to preserve the behaviors of current > implement. And also succeeds in mimicking almost of SUS without > very subtle difference. > > Attached is the new patch which I've edited following the > comments below and some fixed of typos, and added a few regtests. > > If you have no problem with this, I'll send this to committer. > > What do you think of this? > > > My comments are below, > > ====================================== > Following is a comment about the behavior. > > - The minus('-') is a flag, not a sign nor a operator. So this > seems permitted to appear more than one time. For example, > printf(">>%-------10s<<", "hoge") yields the output > ">>hoge______<<" safely. This is consistent with the behavior > when negative value is supplied to '-*' conversion. > > > Followings are some comments about coding, > > in text_format_parse_digits, > > - is_valid seems to be the primary return value so returning > this as function's return value should make the caller more > simple. > > - Although the compiler should deal properly with that, I don't > think it proper to use the memory pointed by function > parameters as local working storage. *inum and *is_valid in > the while loop should be replaced with local variables and > set them after the values are settled. > > for TEXT_FORMAT_NEXT_CHAR, > > - This macro name sounds somewhat confusing and this could be > used also in text_format_parse_digits. I propose > FORWARD_PARSE_POINT instead. Also I removed end_ptr from > macro parameters but I'm not sure of the pertinence of that. > > for text_format_parse_format: > > - Using start_ptr as a working pointer makes the name > inappropriate. > > - Out parameters seems somewhat redundant. indirect_width and > indirect_width_parameter could be merged using 0 to indicate > unnumbered. > > for text_format: > > - maximum number of function argument limited to FUNC_MAX_ARGS > (100), so no need to care of wrap around of argument index, I > suppose. > > - Something seems confusing at the lines follow > > | /* Not enough arguments? Deduct 1 to avoid counting format string. */ > | if (arg > nargs - 1) > > This expression does not have so special meaning. The maximum > index in an zero-based array should not be equal to or larger > than the number of the elements of it. If that's not your > intent, some rewrite would be needed.. > > - Only int4 is directly read for width value in the latest > patch, but int2 can also be directly readable and it should > be needed. > > regards, > > -- > Kyotaro Horiguchi > NTT Open Source Software Center >
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 28 February 2013 11:25, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Umm. sorry, > >> If you have no problem with this, I'll send this to committer. > > I just found that this patch already has a revewer. I've seen > only Status field in patch list.. > > Should I leave this to you, Dean? > Sorry, I've been meaning to review this properly for some time, but I've been swamped with other work, so I'm happy for you to take over. My overall impression is that the patch is in good shape, and provides valuable new functionality, and it is probably close to being ready for committer. I think that the only other behavioural glitch I spotted was that it fails to catch one overflow case, which should generate an "out of ranger" error: select format('|%*s|', -2147483648, 'foo'); Result: |foo| because -(-2147483648) = 0 in signed 32-bit integers. Apart from that, I didn't find any problems during my testing. Thanks for your review. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
Hello 2013/2/28 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 28 February 2013 11:25, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Umm. sorry, >> >>> If you have no problem with this, I'll send this to committer. >> >> I just found that this patch already has a revewer. I've seen >> only Status field in patch list.. >> >> Should I leave this to you, Dean? >> > > Sorry, I've been meaning to review this properly for some time, but > I've been swamped with other work, so I'm happy for you to take over. > > My overall impression is that the patch is in good shape, and provides > valuable new functionality, and it is probably close to being ready > for committer. > > I think that the only other behavioural glitch I spotted was that it > fails to catch one overflow case, which should generate an "out of > ranger" error: > > select format('|%*s|', -2147483648, 'foo'); > Result: |foo| > > because -(-2147483648) = 0 in signed 32-bit integers. fixed - next other overflow check added Regards Pavel > > Apart from that, I didn't find any problems during my testing. > > Thanks for your review. > > Regards, > Dean
Attachment
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Kyotaro HORIGUCHI
Date:
Hello, > > I think that the only other behavioural glitch I spotted was that it > > fails to catch one overflow case, which should generate an "out of > > ranger" error: > > > > select format('|%*s|', -2147483648, 'foo'); > > Result: |foo| > > > > because -(-2147483648) = 0 in signed 32-bit integers. Ouch. Thanks for pointing out. > fixed - next other overflow check added Your change shown below seems assuming that the two's complement of the most negative number in integer types is identical to itself, and looks working as expected at least on linux/x86_64. But C standard defines it as undefined, (As far as I hear :-). | if (width != 0) | { | int32 _width = -width; | | if (SAMESIGN(width, _width)) | ereport(ERROR, Instead, I think we can deny it by simply comparing with INT_MIN. I modified the patch like so and put some modifications on styling. Finally, enlargeStringInfo fails just after for large numbers. We might should keep it under certain length to get rid of memory exhaustion. Anyway, I'll send this patch to committers as it is in this message. best wishes, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9b7e967..b2d2ed6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1519,21 +1519,13 @@ <primary>format</primary> </indexterm> <literal><function>format</function>(<parameter>formatstr</parameter><type>text</type> - [, <parameter>str</parameter> <type>"any"</type> [, ...] ])</literal> + [, <parameter>formatarg</parameter> <type>"any"</type> [, ...] ])</literal> </entry> <entry><type>text</type></entry> <entry> Format arguments according to a format string. - This function is similar to the C function - <function>sprintf</>, but only the following conversion specifications - are recognized: <literal>%s</literal> interpolates the corresponding - argument as a string; <literal>%I</literal> escapes its argument as - an SQL identifier; <literal>%L</literal> escapes its argument as an - SQL literal; <literal>%%</literal> outputs a literal <literal>%</>. - A conversion can reference an explicit parameter position by preceding - the conversion specifier with <literal><replaceable>n</>$</>, where - <replaceable>n</replaceable> is the argument position. - See also <xref linkend="plpgsql-quote-literal-example">. + This function is similar to the C function <function>sprintf</>. + See <xref linkend="functions-string-format">. </entry> <entry><literal>format('Hello %s, %1$s', 'World')</literal></entry> <entry><literal>Hello World, World</literal></entry> @@ -2847,6 +2839,186 @@ </tgroup> </table> + <sect2 id="functions-string-format"> + <title><function>format</function></title> + + <indexterm> + <primary>format</primary> + </indexterm> + + <para> + The function <function>format</> produces formatted output according to + a format string in a similar way to the C function <function>sprintf</>. + </para> + + <para> +<synopsis> +format(<parameter>formatstr</> <type>text</> [, <parameter>formatarg</> <type>"any"</> [, ...] ]) +</synopsis> + <replaceable>formatstr</> is a format string that specifies how the + result should be formatted. Text in the format string is copied directly + to the result, except where <firstterm>format specifiers</> are used. + Format specifiers act as placeholders in the string, allowing subsequent + function arguments to be formatted and inserted into the result. + </para> + + <para> + Format specifiers are introduced by a <literal>%</> character and take + the form +<synopsis> +%[<replaceable>parameter</>][<replaceable>flags</>][<replaceable>width</>]<replaceable>type</> +</synopsis> + <variablelist> + <varlistentry> + <term><replaceable>parameter</replaceable> (optional)</term> + <listitem> + <para> + An expression of the form <literal><replaceable>n</>$</> where + <replaceable>n</> is the index of the argument to use for the format + specifier's value. An index of 1 means the first argument after + <replaceable>formatstr</>. If the <replaceable>parameter</> field is + omitted, the default is to use the next argument. + </para> +<screen> +SELECT format('Testing %s, %s, %s', 'one', 'two', 'three'); +<lineannotation>Result: </><computeroutput>Testing one, two, three</> + +SELECT format('Testing %3$s, %2$s, %1$s', 'one', 'two', 'three'); +<lineannotation>Result: </><computeroutput>Testing three, two, one</> +</screen> + + <para> + Note that unlike the C function <function>sprintf</> defined in the + Single UNIX Specification, the <function>format</> function in + <productname>PostgreSQL</> allows format specifiers with and without + explicit <replaceable>parameter</> fields to be mixed in the same + format string. A format specifier without a + <replaceable>parameter</> field always uses the next argument after + the last argument consumed. In addition, the + <productname>PostgreSQL</> <function>format</> function does not + require all function arguments to be referred to in the format + string. + </para> +<screen> +SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); +<lineannotation>Result: </><computeroutput>Testing three, two, three</> +</screen> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>flags</replaceable> (optional)</term> + <listitem> + <para> + Additional options controlling how the format specifier's output is + formatted. Currently the only supported flag is an minus sign + (<literal>-</>) which will cause the format specifier's output to be + left-aligned. This has no effect unless the <replaceable>width</> + field is also specified. + </para> +<screen> +SELECT format('|%10s|%-10s|', 'foo', 'bar'); +<lineannotation>Result: </><computeroutput>| foo|bar |</> +</screen> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>width</replaceable> (optional)</term> + <listitem> + <para> + Specifies the <emphasis>minimum</> number of characters to use to + display the format specifier's output. The width may be specified + using any of the following: a positive integer; an asterisk + (<literal>*</>) to use the next function argument as the width; or an + expression of the form <literal>*<replaceable>n</>$</> to use the + <replaceable>n</>th function argument as the width. + </para> + + <para> + If the width comes from a function argument, that argument is + consumed <emphasis>before</> the argument that is used for the format + specifier's value. If the width argument is negative, the result is + left aligned, as if the <literal>-</> flag had been specified. + </para> +<screen> +SELECT format('|%10s|', 'foo'); +<lineannotation>Result: </><computeroutput>| foo|</> + +SELECT format('|%*s|', 10, 'foo'); +<lineannotation>Result: </><computeroutput>| foo|</> + +SELECT format('|%*s|', -10, 'foo'); +<lineannotation>Result: </><computeroutput>|foo |</> + +SELECT format('|%-*s|', 10, 'foo'); +<lineannotation>Result: </><computeroutput>|foo |</> + +SELECT format('|%-*s|', -10, 'foo'); +<lineannotation>Result: </><computeroutput>|foo |</> + +SELECT format('|%*2$s|', 'foo', 10, 'bar'); +<lineannotation>Result: </><computeroutput>| bar|</> + +SELECT format('|%3$*2$s|', 'foo', 10, 'bar'); +<lineannotation>Result: </><computeroutput>| bar|</> +</screen> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>type</replaceable> (required)</term> + <listitem> + <para> + The type of format conversion to use to produce the format + specifier's output. The following types are supported: + <itemizedlist> + <listitem> + <para> + <literal>s</literal> formats the argument value as a simple + string. A null value is treated as an empty string. + </para> + </listitem> + <listitem> + <para> + <literal>I</literal> escapes the value as an SQL identifier. It + is an error for the value to be null. + </para> + </listitem> + <listitem> + <para> + <literal>L</literal> escapes the value as an SQL literal. A null + value is displayed as the literal value <literal>NULL</>. + </para> + </listitem> + </itemizedlist> + </para> +<screen> +SELECT format('Hello %s', 'World'); +<lineannotation>Result: </lineannotation><computeroutput>Hello World</computeroutput> + +SELECT format('DROP TABLE %I', 'Foo bar'); +<lineannotation>Result: </lineannotation><computeroutput>DROP TABLE "Foo bar"</computeroutput> + +SELECT format('SELECT %L', E'O\'Reilly'); +<lineannotation>Result: </lineannotation><computeroutput>SELECT 'O''Reilly'</computeroutput> +</screen> + + <para> + The <literal>%I</> and <literal>%L</> format specifiers may be used + to safely construct dynamic SQL statements. See + <xref linkend="plpgsql-quote-literal-example">. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> + In addition to the format specifiers above, the special escape sequence + <literal>%%</> may be used to output a literal <literal>%</> character. + </para> + </sect2> </sect1> diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index e69b7dd..19b8049 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -78,7 +78,8 @@ static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);static StringInfo makeStringAggState(FunctionCallInfofcinfo);static void text_format_string_conversion(StringInfo buf, char conversion, FmgrInfo *typOutputInfo, - Datum value, bool isNull); + Datum value, bool isNull, + int flags, int width);static Datum text_to_array_internal(PG_FUNCTION_ARGS);static text *array_to_text_internal(FunctionCallInfofcinfo, ArrayType *v, const char *fldsep, const char *null_string); @@ -3996,6 +3997,135 @@ text_reverse(PG_FUNCTION_ARGS) PG_RETURN_TEXT_P(result);} +#define FORWARD_PARSE_POINT(ptr) \ +do { \ + if (++(ptr) >= (end_ptr)) \ + ereport(ERROR, \ + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), \ + errmsg("unterminated conversion specifier"))); \ +} while (0) + +/* + * Parse congiguous digits into decimal number. + * + * Returns true if some digits could be parsed and *ptr moved to the next + * character to be parsed. The value is returned into *value. + */ +static bool +text_format_parse_digits(const char **ptr, const char *end_ptr, int *value) +{ + const char *cp = *ptr; + int wval = 0; + bool found; + + /* + * continue, only when start_ptr is less than end_ptr. + * Overrun of cp is checked in FORWARD_PARSE_POINT. + */ + while (*cp >= '0' && *cp <= '9') + { + int newnum = wval * 10 + (*cp - '0'); + + if (newnum / 10 != wval) /* overflow? */ + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("number is out of range"))); + wval = newnum; + FORWARD_PARSE_POINT(cp); + } + + found = (cp > *ptr); + *value = wval; + *ptr = cp; + + return found; +} + +#define TEXT_FORMAT_FLAG_MINUS 0x0001 /* is minus in format string? */ + +/* + * parse format specification + * [argpos][flags][width]type + * + * Return values are, + * static const char * : Address to be parsed next. + * valarg : argument position for value to be printed. -1 means missing. + * widtharg : argument position for width. Zero means that argument position + * is not specified and -1 means missing. + * flags : flags + * width : the value for direct width specification, zero means that width + * is not specified. + */ +static const char * +text_format_parse_format(const char *start_ptr, const char *end_ptr, + int *valarg, int *widtharg, int *flags, int *width) +{ + const char *cp = start_ptr; + int n; + + /* set defaults to out parameters */ + *valarg = -1; + *widtharg = -1; + *flags = 0; + *width = 0; + + /* try to identify first number */ + if (text_format_parse_digits(&cp, end_ptr, &n)) + { + if (*cp != '$') + { + *width = n; /* The number should be width */ + return cp; + } + /* Explicit 0 for argument index is immediately refused */ + if (n == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("conversion specifies argument 0, but arguments are numbered from 1"))); + *valarg = n; /* The number was argument position */ + FORWARD_PARSE_POINT(cp); + } + + /* Check for flags, only minus is supported now. */ + while (*cp == '-') + { + *flags = *flags | TEXT_FORMAT_FLAG_MINUS; + FORWARD_PARSE_POINT(cp); + } + + /* try to parse indirect width */ + if (*cp == '*') + { + FORWARD_PARSE_POINT(cp); + + if (text_format_parse_digits(&cp, end_ptr, &n)){ + /* number in this position should be closed by $ */ + if (*cp != '$') + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unexpected char \"%c\".",*cp))); + FORWARD_PARSE_POINT(cp); + + /* Explicit 0 for argument index is immediately refused */ + if (n == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("conversion specifies argument 0, but arguments are numbered from 1"))); + *widtharg = n; + } + else + *widtharg = 0; /* 0 means argument position is not specified */ + + return cp; + } + + /* last possible number - width */ + if (text_format_parse_digits(&cp, end_ptr, &n)) + *width = n; + + return cp; +} +/* * Returns a formated string */ @@ -4016,6 +4146,8 @@ text_format(PG_FUNCTION_ARGS) Oid element_type = InvalidOid; Oid prev_type= InvalidOid; FmgrInfo typoutputfinfo; + FmgrInfo typoutputinfo_width; + Oid prev_type_width = InvalidOid; /* When format string is null, returns null */ if (PG_ARGISNULL(0)) @@ -4077,7 +4209,7 @@ text_format(PG_FUNCTION_ARGS) } /* Setup for main loop. */ - fmt = PG_GETARG_TEXT_PP(0); + fmt = PG_GETARG_TEXT_PP(arg++); start_ptr = VARDATA_ANY(fmt); end_ptr = start_ptr + VARSIZE_ANY_EXHDR(fmt); initStringInfo(&str); @@ -4088,6 +4220,10 @@ text_format(PG_FUNCTION_ARGS) Datum value; bool isNull; Oid typid; + int valarg; + int widtharg; + int flags; + int width; /* * If it's not the start of a conversion specifier, just copy it to @@ -4099,11 +4235,7 @@ text_format(PG_FUNCTION_ARGS) continue; } - /* Did we run off the end of the string? */ - if (++cp >= end_ptr) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unterminated conversion specifier"))); + FORWARD_PARSE_POINT(cp); /* Easy case: %% outputs a single % */ if (*cp == '%') @@ -4112,69 +4244,84 @@ text_format(PG_FUNCTION_ARGS) continue; } - /* - * If the user hasn't specified an argument position, we just advance - * to the next one. If they have, we must parse it. - */ - if (*cp < '0' || *cp > '9') + cp = text_format_parse_format(cp, end_ptr, + &valarg, &widtharg, &flags, &width); + + if (widtharg >= 0) { - ++arg; - if (arg <= 0) /* overflow? */ - { - /* - * Should not happen, as you can't pass billions of arguments - * to a function, but better safe than sorry. - */ + if (widtharg > 0) + /* be consistent, move ordered argument together with + * positional */ + arg = widtharg; + + if (arg >= nargs) ereport(ERROR, - (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), - errmsg("argument number is out of range"))); - } - } - else - { - bool unterminated = false; + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("too few arguments for format"))); - /* Parse digit string. */ - arg = 0; - do + if (!funcvariadic) { - int newarg = arg * 10 + (*cp - '0'); + value = PG_GETARG_DATUM(arg); + isNull = PG_ARGISNULL(arg); + typid = get_fn_expr_argtype(fcinfo->flinfo, arg); + } + else + { + value = elements[arg - 1]; + isNull = nulls[arg - 1]; + typid = element_type; + } + if (!OidIsValid(typid)) + elog(ERROR, "could not determine data type of format() input"); - if (newarg / 10 != arg) /* overflow? */ - ereport(ERROR, - (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), - errmsg("argument number is out of range"))); - arg = newarg; - ++cp; - } while (cp < end_ptr && *cp >= '0' && *cp <= '9'); + arg++; /* - * If we ran off the end, or if there's not a $ next, or if the $ - * is the last character, the conversion specifier is improperly - * terminated. + * we don't need to different between NULL and zero in this moment, + * NULL means ignore this width - same as zero. */ - if (cp == end_ptr || *cp != '$') - unterminated = true; + if (isNull) + width = 0; + else if (typid == INT4OID) + width = DatumGetInt32(value); + else if (typid == INT2OID) + width = DatumGetInt16(value); else { - ++cp; - if (cp == end_ptr) - unterminated = true; - } - if (unterminated) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unterminated conversion specifier"))); + char *str; - /* There's no argument 0. */ - if (arg == 0) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("conversion specifies argument 0, but arguments are numbered from 1"))); + /* simple IO cast to int */ + if (typid != prev_type_width) + { + Oid typoutputfunc; + bool typIsVarlena; + + getTypeOutputInfo(typid, &typoutputfunc, &typIsVarlena); + fmgr_info(typoutputfunc, &typoutputinfo_width); + prev_type_width = typid; + } + + /* Stringify. */ + str = OutputFunctionCall(&typoutputinfo_width, value); + + /* get int value */ + width = pg_atoi(str, sizeof(int32), '\0'); + pfree(str); + } } - /* Not enough arguments? Deduct 1 to avoid counting format string. */ - if (arg > nargs - 1) + /* We calculate -width later but -INT_MIN is undefined for int. */ + if (width <= INT_MIN) + ereport(ERROR, + (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("number is out of range"))); + + if (valarg >= 0) + /* be consistent, move ordered argument together with + * positional */ + arg = valarg; + + if (arg >= nargs) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("too few arguments for format"))); @@ -4195,6 +4342,8 @@ text_format(PG_FUNCTION_ARGS) if (!OidIsValid(typid)) elog(ERROR, "could not determinedata type of format() input"); + arg++; + /* * Get the appropriate typOutput function, reusing previous one if * same type as previous argument. That's particularly useful in the @@ -4221,7 +4370,7 @@ text_format(PG_FUNCTION_ARGS) case 'I': case 'L': text_format_string_conversion(&str,*cp, &typoutputfinfo, - value, isNull); + value, isNull, flags, width); break; default: ereport(ERROR, @@ -4244,23 +4393,65 @@ text_format(PG_FUNCTION_ARGS) PG_RETURN_TEXT_P(result);} +/* + * Add spaces on begin or on end when it is necessary + */ +static void +text_format_append_string(StringInfo buf, const char *str, + int flags, int width) +{ + bool align_to_left = false; + int len; + + /* fast path */ + if (width == 0) + { + appendStringInfoString(buf, str); + return; + } + else if (width < 0 || (flags & TEXT_FORMAT_FLAG_MINUS)) + { + align_to_left = true; + if (width < 0) + width = -width; + } + + len = pg_mbstrlen(str); + if (align_to_left) + { + appendStringInfoString(buf, str); + if (len < width) + appendStringInfoSpaces(buf, width - len); + } + else + { + /* align_to_right */ + if (len < width) + appendStringInfoSpaces(buf, width - len); + appendStringInfoString(buf, str); + } +} +/* Format a %s, %I, or %L conversion. */static voidtext_format_string_conversion(StringInfo buf, char conversion, FmgrInfo *typOutputInfo, - Datum value, bool isNull) + Datum value, bool isNull, + int flags, int width){ char *str; - /* Handle NULL arguments before trying to stringify the value. */ if (isNull) { - if (conversion == 'L') - appendStringInfoString(buf, "NULL"); + if (conversion == 's') + text_format_append_string(buf, "", flags, width); + else if (conversion == 'L') + text_format_append_string(buf, "NULL", flags, width); else if (conversion == 'I') ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("null values cannot be formattedas an SQL identifier"))); + return; } @@ -4271,18 +4462,18 @@ text_format_string_conversion(StringInfo buf, char conversion, if (conversion == 'I') { /* quote_identifier may or may not allocate a new string. */ - appendStringInfoString(buf, quote_identifier(str)); + text_format_append_string(buf, quote_identifier(str), flags, width); } else if (conversion == 'L') { char *qstr = quote_literal_cstr(str); - appendStringInfoString(buf, qstr); + text_format_append_string(buf, qstr, flags, width); /* quote_literal_cstr() always allocates a new string*/ pfree(qstr); } else - appendStringInfoString(buf, str); + text_format_append_string(buf, str, flags, width); /* Cleanup. */ pfree(str); diff --git a/src/test/regress/expected/text.out b/src/test/regress/expected/text.out index b756583..e05a1e5 100644 --- a/src/test/regress/expected/text.out +++ b/src/test/regress/expected/text.out @@ -256,12 +256,20 @@ select format('%1$s %4$s', 1, 2, 3);ERROR: too few arguments for formatselect format('%1$s %13$s',1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12);ERROR: too few arguments for format +select format('%0$s', 'Hello'); +ERROR: conversion specifies argument 0, but arguments are numbered from 1 +select format('%*0$s', 'Hello'); +ERROR: conversion specifies argument 0, but arguments are numbered from 1select format('%1s', 1); -ERROR: unterminated conversion specifier + format +-------- + 1 +(1 row) +select format('%1$', 1);ERROR: unterminated conversion specifierselect format('%1$1', 1); -ERROR: unrecognized conversion specifier "1" +ERROR: unterminated conversion specifier-- check mix of positional and ordered placeholdersselect format('Hello %s %1$s%s', 'World', 'Hello again'); format @@ -328,3 +336,74 @@ from generate_series(1,200) g(i); 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200(1 row) +-- left, right align +select format('>>%10s<<', 'Hello') +union all +select format('>>%10s<<', NULL) +union all +select format('>>%10s<<', '') +union all +select format('>>%-10s<<', '') +union all +select format('>>%-10s<<', 'Hello') +union all +select format('>>%-10s<<', NULL) +union all +select format('>>%1$10s<<', 'Hello') +union all +select format('>>%1$-10I<<', 'Hello') +union all +select format('>>%2$*1$L<<', 10, 'Hello') +union all +select format('>>%2$*1$L<<', 10, NULL) +union all +select format('>>%2$*1$L<<', -10, NULL) +union all +select format('>>%*s<<', 10, 'Hello'); + format +---------------- + >> Hello<< + >> << + >> << + >> << + >>Hello << + >> << + >> Hello<< + >>"Hello" << + >> 'Hello'<< + >> NULL<< + >>NULL << + >> Hello<< +(12 rows) + +select format('>>%*1$s<<', 10, 'Hello'); + format +---------------- + >> Hello<< +(1 row) + +select format('>>%-s<<', 'Hello'); + format +----------- + >>Hello<< +(1 row) + +-- NULL is not different to zero here +select format('>>%10L<<', NULL); + format +---------------- + >> NULL<< +(1 row) + +select format('>>%2$*1$L<<', NULL, 'Hello'); + format +------------- + >>'Hello'<< +(1 row) + +select format('>>%2$*1$L<<', 0, 'Hello'); + format +------------- + >>'Hello'<< +(1 row) + diff --git a/src/test/regress/sql/text.sql b/src/test/regress/sql/text.sql index a96e9f7..1c68754 100644 --- a/src/test/regress/sql/text.sql +++ b/src/test/regress/sql/text.sql @@ -78,6 +78,8 @@ select format('%1$s %12$s', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12);-- should failselect format('%1$s %4$s',1, 2, 3);select format('%1$s %13$s', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12); +select format('%0$s', 'Hello'); +select format('%*0$s', 'Hello');select format('%1s', 1);select format('%1$', 1);select format('%1$1', 1); @@ -97,3 +99,36 @@ select format('Hello', variadic NULL);-- variadic argument allows simulating more than FUNC_MAX_ARGS parametersselectformat(string_agg('%s',','), variadic array_agg(i))from generate_series(1,200) g(i); + +-- left, right align +select format('>>%10s<<', 'Hello') +union all +select format('>>%10s<<', NULL) +union all +select format('>>%10s<<', '') +union all +select format('>>%-10s<<', '') +union all +select format('>>%-10s<<', 'Hello') +union all +select format('>>%-10s<<', NULL) +union all +select format('>>%1$10s<<', 'Hello') +union all +select format('>>%1$-10I<<', 'Hello') +union all +select format('>>%2$*1$L<<', 10, 'Hello') +union all +select format('>>%2$*1$L<<', 10, NULL) +union all +select format('>>%2$*1$L<<', -10, NULL) +union all +select format('>>%*s<<', 10, 'Hello'); + +select format('>>%*1$s<<', 10, 'Hello'); +select format('>>%-s<<', 'Hello'); + +-- NULL is not different to zero here +select format('>>%10L<<', NULL); +select format('>>%2$*1$L<<', NULL, 'Hello'); +select format('>>%2$*1$L<<', 0, 'Hello');
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/3/5 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>: > Hello, > >> > I think that the only other behavioural glitch I spotted was that it >> > fails to catch one overflow case, which should generate an "out of >> > ranger" error: >> > >> > select format('|%*s|', -2147483648, 'foo'); >> > Result: |foo| >> > >> > because -(-2147483648) = 0 in signed 32-bit integers. > > Ouch. Thanks for pointing out. > >> fixed - next other overflow check added > > Your change shown below seems assuming that the two's complement > of the most negative number in integer types is identical to > itself, and looks working as expected at least on > linux/x86_64. But C standard defines it as undefined, (As far as > I hear :-). > > | if (width != 0) > | { > | int32 _width = -width; > | > | if (SAMESIGN(width, _width)) > | ereport(ERROR, > this pattern was used elsewhere in pg > Instead, I think we can deny it by simply comparing with > INT_MIN. I modified the patch like so and put some modifications > on styling. ook - I have not enough expirience with this topic and I cannot say what is preferred. > > Finally, enlargeStringInfo fails just after for large numbers. We > might should keep it under certain length to get rid of memory > exhaustion. I though about it, but I don't know a correct value - probably any width specification higher 1MB will be bogus and can be blocked ?? Our VARLENA max size is 1GB so with should not be higher than 1GB ever. what do you thinking about these limits? Regards Pavel > > Anyway, I'll send this patch to committers as it is in this > message. > > best wishes, > > -- > Kyotaro Horiguchi > NTT Open Source Software Center
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Dean Rasheed
Date:
On 5 March 2013 13:46, Pavel Stehule <pavel.stehule@gmail.com> wrote: > 2013/3/5 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>: >> Hello, >> >>> > I think that the only other behavioural glitch I spotted was that it >>> > fails to catch one overflow case, which should generate an "out of >>> > ranger" error: >>> > >>> > select format('|%*s|', -2147483648, 'foo'); >>> > Result: |foo| >>> > >>> > because -(-2147483648) = 0 in signed 32-bit integers. >> >> Ouch. Thanks for pointing out. >> >>> fixed - next other overflow check added >> >> Your change shown below seems assuming that the two's complement >> of the most negative number in integer types is identical to >> itself, and looks working as expected at least on >> linux/x86_64. But C standard defines it as undefined, (As far as >> I hear :-). >> >> | if (width != 0) >> | { >> | int32 _width = -width; >> | >> | if (SAMESIGN(width, _width)) >> | ereport(ERROR, >> > > this pattern was used elsewhere in pg > >> Instead, I think we can deny it by simply comparing with >> INT_MIN. I modified the patch like so and put some modifications >> on styling. > > ook - I have not enough expirience with this topic and I cannot say > what is preferred. > >> >> Finally, enlargeStringInfo fails just after for large numbers. We >> might should keep it under certain length to get rid of memory >> exhaustion. > > I though about it, but I don't know a correct value - probably any > width specification higher 1MB will be bogus and can be blocked ?? Our > VARLENA max size is 1GB so with should not be higher than 1GB ever. > > what do you thinking about these limits? > I think it's fine as it is. It's no different from repeat() for example. We allow repeat('a', 1000000000) so allowing format('%1000000000s', '') seems reasonable, although probably not very useful. Upping either beyond 1GB generates an out of memory error, which also seems reasonable -- I can't imagine why you would want such a long string. Regards, Dean
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Pavel Stehule
Date:
2013/3/5 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 5 March 2013 13:46, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> 2013/3/5 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>: >>> Hello, >>> >>>> > I think that the only other behavioural glitch I spotted was that it >>>> > fails to catch one overflow case, which should generate an "out of >>>> > ranger" error: >>>> > >>>> > select format('|%*s|', -2147483648, 'foo'); >>>> > Result: |foo| >>>> > >>>> > because -(-2147483648) = 0 in signed 32-bit integers. >>> >>> Ouch. Thanks for pointing out. >>> >>>> fixed - next other overflow check added >>> >>> Your change shown below seems assuming that the two's complement >>> of the most negative number in integer types is identical to >>> itself, and looks working as expected at least on >>> linux/x86_64. But C standard defines it as undefined, (As far as >>> I hear :-). >>> >>> | if (width != 0) >>> | { >>> | int32 _width = -width; >>> | >>> | if (SAMESIGN(width, _width)) >>> | ereport(ERROR, >>> >> >> this pattern was used elsewhere in pg >> >>> Instead, I think we can deny it by simply comparing with >>> INT_MIN. I modified the patch like so and put some modifications >>> on styling. >> >> ook - I have not enough expirience with this topic and I cannot say >> what is preferred. >> >>> >>> Finally, enlargeStringInfo fails just after for large numbers. We >>> might should keep it under certain length to get rid of memory >>> exhaustion. >> >> I though about it, but I don't know a correct value - probably any >> width specification higher 1MB will be bogus and can be blocked ?? Our >> VARLENA max size is 1GB so with should not be higher than 1GB ever. >> >> what do you thinking about these limits? >> > > I think it's fine as it is. > > It's no different from repeat() for example. We allow repeat('a', > 1000000000) so allowing format('%1000000000s', '') seems reasonable, > although probably not very useful. > > Upping either beyond 1GB generates an out of memory error, which also > seems reasonable -- I can't imagine why you would want such a long > string. > > Regards, > Dean ok Pavel
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Alvaro Herrera
Date:
Kyotaro HORIGUCHI escribió: > Umm. sorry, > > > If you have no problem with this, I'll send this to committer. > > I just found that this patch already has a revewer. I've seen > only Status field in patch list.. Patches can be reviewed by more than one people. It's particularly useful if they have different things to say. So don't hesitate to review patches that have already been reviewed by other people. In fact, you can even review committed patches; it's not unlikely that you will be able to find bugs in those, too. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Kyotaro HORIGUCHI
Date:
Hello, > Patches can be reviewed by more than one people. It's particularly > useful if they have different things to say. So don't hesitate to > review patches that have already been reviewed by other people. Thanks for the advice. I was anxious about who among the reviewers is, and when to make a decisision if the patch has reached the level or not, I'll take it more easy. > In fact, you can even review committed patches; it's not unlikely that > you will be able to find bugs in those, too. Umm.. to be sure.. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > [ format-width-20130305.patch ] Applied with some mostly-cosmetic adjustments. I also took the liberty of changing some of the error message texts to line up more closely with the expanded documentation (eg, use "format specifier" not "conversion specifier" because that's the phrase used in the docs). regards, tom lane
Re: Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
From
Kyotaro HORIGUCHI
Date:
Thank you for committing this patch. > Applied with some mostly-cosmetic adjustments. I also took the > liberty of changing some of the error message texts to line up > more closely with the expanded documentation (eg, use "format > specifier" not "conversion specifier" because that's the phrase > used in the docs). I looked over the modifications. Thanks for refining rather large portion of documentation and comments.. and code. regards, -- Kyotaro Horiguchi NTT Open Source Software Center