Thread: WIP: to_char, support for EEEE format

WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
Hello

I was surprised so PostgreSQL doesn't support this basic output format.

Regards
Pavel Stehule

Attachment

Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
On Sat, Apr 11, 2009 at 2:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> I was surprised so PostgreSQL doesn't support this basic output format.
>

Hi Pavel,

I had a look through your patch and would like to suggest improvements
to the new error messages you've introduced.

1. "invalid using of format EEEE"

This message occurs several times in the patch.  For one thing, the
grammar is wrong; it should be "use", not "using".

Additionally, this message on its own is not very helpful.  If I was
trying to use to_char and got "invalid use of format" my first thought
would be "Invalid how?"  The message should at minimum have a DETAIL,
and possibly a HINT as well to make it effective.

2. "cannot use EEEE and others"

The wording on this message is a bit awkward.  I think what you meant
to say is that EEEE cannot be used with certain other formatting
codes, but this should be made explicit in the message.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
Hi,

I know very well, so all texts in my patches should be translated to
English. My language skills are really minimal.

So, please, if you can, propose these error messages (with hints)-
result will be much better.

Thank you
Pavel


2009/4/10 Brendan Jurd <direvus@gmail.com>:
> On Sat, Apr 11, 2009 at 2:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> I was surprised so PostgreSQL doesn't support this basic output format.
>>
>
> Hi Pavel,
>
> I had a look through your patch and would like to suggest improvements
> to the new error messages you've introduced.
>
> 1. "invalid using of format EEEE"
>
> This message occurs several times in the patch.  For one thing, the
> grammar is wrong; it should be "use", not "using".
>
> Additionally, this message on its own is not very helpful.  If I was
> trying to use to_char and got "invalid use of format" my first thought
> would be "Invalid how?"  The message should at minimum have a DETAIL,
> and possibly a HINT as well to make it effective.
>
> 2. "cannot use EEEE and others"
>
> The wording on this message is a bit awkward.  I think what you meant
> to say is that EEEE cannot be used with certain other formatting
> codes, but this should be made explicit in the message.
>
> Cheers,
> BJ
>


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
On Sat, Apr 11, 2009 at 3:51 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> So, please, if you can, propose these error messages (with hints)-
> result will be much better.
>

Hi Pavel,

I was doing some work on rewording these error messages, and I noticed
that the following code segment occurs identically in four different
locations:
    numstr = orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);    if (Num.pre != 1)        ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),                errmsg("invalid using of format EEEE")));
 

Rather than rewording all four copies of the message, I wonder if this
test might be better factored out into a separate function?

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/4/21 Brendan Jurd <direvus@gmail.com>:
> On Sat, Apr 11, 2009 at 3:51 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> So, please, if you can, propose these error messages (with hints)-
>> result will be much better.
>>
>
> Hi Pavel,
>
> I was doing some work on rewording these error messages, and I noticed
> that the following code segment occurs identically in four different
> locations:
>
>                numstr = orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
>                if (Num.pre != 1)
>                        ereport(ERROR,
>                                        (errcode(ERRCODE_SYNTAX_ERROR),
>                                         errmsg("invalid using of format EEEE")));
>


> Rather than rewording all four copies of the message, I wonder if this
> test might be better factored out into a separate function?

maybe macro is better - it is too short and without any semantic for
function, but maybe not. The length of source code is not problem -
the short function will be inlined, so total length will be same. What
should be name for this function or for this macro? It hasn't any
semantic. There should be readable macro only for ereport function -
some

#define RAISE_EEEE_FORMAT_ERROR        ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR), errmsg("invalid using of format
EEEE")));

regards
Pavel Stehule

>
> Cheers,
> BJ
>


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
On Wed, Apr 22, 2009 at 10:13 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2009/4/21 Brendan Jurd <direvus@gmail.com>:
>>                numstr = orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
>>                if (Num.pre != 1)
>>                        ereport(ERROR,
>>                                        (errcode(ERRCODE_SYNTAX_ERROR),
>>                                         errmsg("invalid using of format EEEE")));
>>
>> Rather than rewording all four copies of the message, I wonder if this
>> test might be better factored out into a separate function?
>
> maybe macro is better - it is too short and without any semantic for
> function, but maybe not. The length of source code is not problem -
> the short function will be inlined, so total length will be same. What
> should be name for this function or for this macro? It hasn't any
> semantic. There should be readable macro only for ereport function -
> some

I was thinking of factoring out the *test*, not just the error message.

If I've been reading this code correctly, the purpose of
               if (Num.pre != 1)

is to make sure that the numeric format has been given with one digit
before the decimal place (so 9.99EEEE would be acceptable but
99.999EEEE would cause the ERROR).

Because this check is made from various places in the code, it makes
sense to me that it should be a function.  Duplicated code makes me
itchy.  Perhaps called something like
sci_notation_check_format(NUMDesc *).

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
On Sat, Apr 11, 2009 at 3:51 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> So, please, if you can, propose these error messages (with hints)-
> result will be much better.
>

Hi Pavel, hackers.

I've done some work updating Pavel's sci notation patch for to_char().
 I've attached a patch against HEAD (EEEE_2.diff.bz2) and against
Pavel's patch as originally submitted to the list
(EEEE_1-to-2.diff.bz2).

A summary of my changes:

 * Improve the wording of error messages, and add some detail/hint parts.
 * Update the documentation.
 * Move the duplicated "one digit before decimal point" test into a macro.
 * Fix a bug in the int8_to_char() implementation (was using the wrong
variable).

Hope you find this useful.

Cheers,
BJ

Attachment

Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
Hi Brendan

this looks well, so please, add it to commitfest page

regards
Pavel Stehule

2009/4/26 Brendan Jurd <direvus@gmail.com>:
> On Sat, Apr 11, 2009 at 3:51 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> So, please, if you can, propose these error messages (with hints)-
>> result will be much better.
>>
>
> Hi Pavel, hackers.
>
> I've done some work updating Pavel's sci notation patch for to_char().
>  I've attached a patch against HEAD (EEEE_2.diff.bz2) and against
> Pavel's patch as originally submitted to the list
> (EEEE_1-to-2.diff.bz2).
>
> A summary of my changes:
>
>  * Improve the wording of error messages, and add some detail/hint parts.
>  * Update the documentation.
>  * Move the duplicated "one digit before decimal point" test into a macro.
>  * Fix a bug in the int8_to_char() implementation (was using the wrong
> variable).
>
> Hope you find this useful.
>
> Cheers,
> BJ
>


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/4/26 Brendan Jurd <direvus@gmail.com>:
> I've done some work updating Pavel's sci notation patch for to_char().

That patch again, now with a couple of minor tweaks to make it apply
cleanly against the current HEAD.

Cheers,
BJ

Attachment

Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> 2009/4/26 Brendan Jurd <direvus@gmail.com>:
>> I've done some work updating Pavel's sci notation patch for to_char().
> 
> That patch again, now with a couple of minor tweaks to make it apply
> cleanly against the current HEAD.
> 
Here is my review. The patch applied without problems. The docs and regression
tests are included. Both of them worked as expected. Also, you included a fix
in RN format, do it in another patch.

The behavior is not the same as Oracle. Oracle accepts an invalid scientific
notation '999.9EEEE'. Will we support it too? I think so.

euler=# SELECT to_char(1234.56789, '999.9EEEE');
ERRO:  invalid format for scientific notation
DETALHE:  "EEEE" requires exactly one digit before the decimal point.
DICA:  For example, "9.999EEEE" is a valid format.

TO_CHAR(1234.56789,'999.9EEEE')
------------------------------- 1.2E+03

1 rows selected

The '9.999eeee' format error message is misleading.

euler=# select to_char(123, '9.999eeee');
ERRO:  cannot use "EEEE" twice

You could include an example in manual too. You could add the two failing
cases above in regression tests too.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/7/24 Euler Taveira de Oliveira <euler@timbira.com>:
> Here is my review. The patch applied without problems. The docs and regression
> tests are included. Both of them worked as expected. Also, you included a fix
> in RN format, do it in another patch.
>

Well, I updated an error message for RN to keep it consistent with the
change I made to the nearby EEEE error message.

Neither RN or EEEE is supported for input, and the error messages were
vague on this point (they just said "not supported").

I understand that separate improvements should be submitted as
separate patches, but this is really part of the one improvement.
Implementing EEEE required improving the error messages, and
consistency required that we improve the RN error message also.

> The behavior is not the same as Oracle. Oracle accepts an invalid scientific
> notation '999.9EEEE'. Will we support it too? I think so.
>
> euler=# SELECT to_char(1234.56789, '999.9EEEE');
> ERRO:  invalid format for scientific notation
> DETALHE:  "EEEE" requires exactly one digit before the decimal point.
> DICA:  For example, "9.999EEEE" is a valid format.
>
> TO_CHAR(1234.56789,'999.9EEEE')
> -------------------------------
>  1.2E+03

*shakes fist at Oracle* yes, I suppose we had better follow suit.

> The '9.999eeee' format error message is misleading.
>
> euler=# select to_char(123, '9.999eeee');
> ERRO:  cannot use "EEEE" twice

Ah, thanks for picking up on this.  This was a bug in the original
patch.  Looks like we forgot to update the formatting keyword for
lowercase "e".

>
> You could include an example in manual too. You could add the two failing
> cases above in regression tests too.
>

I had already added an example to the manual.

Please find attached version 4 of the patch, and incremental diff from
version 3.  It fixes the "eeee" bug ("eeee" is now accepted as a valid
form of "EEEE"), and lifts the restriction on only having one digit
before the decimal point.

Cheers,
BJ

Attachment

Re: WIP: to_char, support for EEEE format

From
Robert Haas
Date:
On Sat, Jul 25, 2009 at 2:08 AM, Brendan Jurd<direvus@gmail.com> wrote:
> 2009/7/24 Euler Taveira de Oliveira <euler@timbira.com>:
>> Here is my review. The patch applied without problems. The docs and regression
>> tests are included. Both of them worked as expected. Also, you included a fix
>> in RN format, do it in another patch.
>>
>
> Well, I updated an error message for RN to keep it consistent with the
> change I made to the nearby EEEE error message.
>
> Neither RN or EEEE is supported for input, and the error messages were
> vague on this point (they just said "not supported").
>
> I understand that separate improvements should be submitted as
> separate patches, but this is really part of the one improvement.
> Implementing EEEE required improving the error messages, and
> consistency required that we improve the RN error message also.
>
>> The behavior is not the same as Oracle. Oracle accepts an invalid scientific
>> notation '999.9EEEE'. Will we support it too? I think so.
>>
>> euler=# SELECT to_char(1234.56789, '999.9EEEE');
>> ERRO:  invalid format for scientific notation
>> DETALHE:  "EEEE" requires exactly one digit before the decimal point.
>> DICA:  For example, "9.999EEEE" is a valid format.
>>
>> TO_CHAR(1234.56789,'999.9EEEE')
>> -------------------------------
>>  1.2E+03
>
> *shakes fist at Oracle* yes, I suppose we had better follow suit.
>
>> The '9.999eeee' format error message is misleading.
>>
>> euler=# select to_char(123, '9.999eeee');
>> ERRO:  cannot use "EEEE" twice
>
> Ah, thanks for picking up on this.  This was a bug in the original
> patch.  Looks like we forgot to update the formatting keyword for
> lowercase "e".
>
>>
>> You could include an example in manual too. You could add the two failing
>> cases above in regression tests too.
>>
>
> I had already added an example to the manual.
>
> Please find attached version 4 of the patch, and incremental diff from
> version 3.  It fixes the "eeee" bug ("eeee" is now accepted as a valid
> form of "EEEE"), and lifts the restriction on only having one digit
> before the decimal point.

Euler,

Are you going to review this again?  We need to know if it is Ready
for Committer.

...Robert


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> Please find attached version 4 of the patch, and incremental diff from
> version 3.  It fixes the "eeee" bug ("eeee" is now accepted as a valid
> form of "EEEE"), and lifts the restriction on only having one digit
> before the decimal point.
> 
Looks better but I did some tests and caught some strange behaviors.

SQL> SELECT to_char(1234.56789, '8.999EEEE') FROM DUAL;
SELECT to_char(1234.56789, '8.999EEEE') FROM DUAL

ERROR at line 1:
ORA-01481: invalid number format model


SQL> SELECT to_char(1234.56789, '9.080EEEE') FROM DUAL;
SELECT to_char(1234.56789, '9.080EEEE') FROM DUAL

ERROR at line 1:
ORA-01481: invalid number format model

This is not a problem with your patch but something that needs to be fixed in
PostgreSQL to match Oracle behavior. The following example should emit an
error. IMHO, filling the string with # is very strange. TODO?

euler=# SELECT to_char(1234.56789, '9.080');to_char
--------- #.#8#
(1 row)

Couldn't the following code be put inside switch clause? If not, you should
add a comment why the validation is outside switch.

+   if (IS_EEEE(num) && n->key->id != NUM_E)
+   {
+       NUM_cache_remove(last_NUMCacheEntry);
+       ereport(ERROR,
+               (errcode(ERRCODE_SYNTAX_ERROR),
+                errmsg("\"EEEE\" must be the last pattern used")));
+   }
+   switch (n->key->id)   {       case NUM_9:

Oracle has a diferent overflow limit [1] but I think we could stay with the
PostgreSQL one. But the #.#### is not the intended behavior. IIRC you're
limited to 99 exponent.

SQL> SELECT to_char(1.234567E+308, '9.999EEEE');
SELECT to_char(1.234567E+308, '9.999EEEE')

ERROR at line 1:
ORA-01426: numeric overflow

euler=# SELECT to_char(1.234567E+308, '9.999EEEE'); to_char
-----------#.#######
(1 row)

The problem is in numeric_to_char() and float8_to_char(). You could fix it
with the following code. Besides that I think you should comment why '5' or
'6' in the other *_to_char() functions.

+    /* 6 means '.' (decimal point), 'E', '+', and 3 exponent digits */
+       if (isnan(value) || is_infinite(value) ||
+            len > Num.pre + Num.post + 6)
+       {
+           numstr = (char *) palloc(Num.pre + Num.post + 7);
+           fill_str(numstr, '#', Num.pre + Num.post + 6);
+           *(numstr + Num.pre) = '.';
+       }

I can't see more problems in your patch. When you fix it, it'll be ready for a
committer.


[1]
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#sthref80


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/7/29 Euler Taveira de Oliveira <euler@timbira.com>:
> Looks better but I did some tests and caught some strange behaviors.
>

Pavel,

Any comment on these cases?  When I looked at the patch I wasn't
really sure why it filled the string with '#' either, but I didn't
question it.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/7/29 Brendan Jurd <direvus@gmail.com>:
> 2009/7/29 Euler Taveira de Oliveira <euler@timbira.com>:
>> Looks better but I did some tests and caught some strange behaviors.
>>
>
> Pavel,
>
> Any comment on these cases?  When I looked at the patch I wasn't
> really sure why it filled the string with '#' either, but I didn't
> question it.

This is consistent with postgres.

Pavel

>
> Cheers,
> BJ
>


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/7/29 Euler Taveira de Oliveira <euler@timbira.com>:
> This is not a problem with your patch but something that needs to be fixed in
> PostgreSQL to match Oracle behavior. The following example should emit an
> error. IMHO, filling the string with # is very strange. TODO?
>
> euler=# SELECT to_char(1234.56789, '9.080');
>  to_char
> ---------
>  #.#8#
> (1 row)

The formatting functions have a lot of weird corner cases.  I've been
trying to clean up some of the more bizarre behaviours in the
date/time formatting functions, but haven't touched the numeric
formatting because I haven't ever needed to use it.

Filling unused characters in the string with "#" may be strange, but
changing it would require a much broader patch that covers all of the
numeric formatting styles, not just EEEE.  A TODO is probably the way
to go.

>
> Couldn't the following code be put inside switch clause? If not, you should
> add a comment why the validation is outside switch.
>
> +   if (IS_EEEE(num) && n->key->id != NUM_E)
> +   {
> +       NUM_cache_remove(last_NUMCacheEntry);
> +       ereport(ERROR,
> +               (errcode(ERRCODE_SYNTAX_ERROR),
> +                errmsg("\"EEEE\" must be the last pattern used")));
> +   }
> +
>    switch (n->key->id)
>    {
>        case NUM_9:

The switch is on (n->key->id), but the test you mentioned above is
looking for any keywords *other than* the EEEE keyword, where EEEE has
previously been parsed.

So if you put the test inside the switch, it would need to appear in
every single branch of the switch except for the NUM_E one.  I'm
confused about why you think this needs a comment.  Perhaps I
misunderstood you?

>
> Oracle has a diferent overflow limit [1] but I think we could stay with the
> PostgreSQL one. But the #.#### is not the intended behavior. IIRC you're
> limited to 99 exponent.
>
> SQL> SELECT to_char(1.234567E+308, '9.999EEEE');
> SELECT to_char(1.234567E+308, '9.999EEEE')
>
> ERROR at line 1:
> ORA-01426: numeric overflow
>
> euler=# SELECT to_char(1.234567E+308, '9.999EEEE');
>  to_char
> -----------
>  #.#######
> (1 row)

I don't see any problem with extending this to allow up to 3 exponent
digits ... Pavel, any comment?

>
> The problem is in numeric_to_char() and float8_to_char(). You could fix it
> with the following code. Besides that I think you should comment why '5' or
> '6' in the other *_to_char() functions.
>
> +       /* 6 means '.' (decimal point), 'E', '+', and 3 exponent digits */

Agreed about the comment; I'll add it.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/7/29 Brendan Jurd <direvus@gmail.com>:
> 2009/7/29 Euler Taveira de Oliveira <euler@timbira.com>:
>> This is not a problem with your patch but something that needs to be fixed in
>> PostgreSQL to match Oracle behavior. The following example should emit an
>> error. IMHO, filling the string with # is very strange. TODO?
>>
>> euler=# SELECT to_char(1234.56789, '9.080');
>>  to_char
>> ---------
>>  #.#8#
>> (1 row)
>
> The formatting functions have a lot of weird corner cases.  I've been
> trying to clean up some of the more bizarre behaviours in the
> date/time formatting functions, but haven't touched the numeric
> formatting because I haven't ever needed to use it.
>
> Filling unused characters in the string with "#" may be strange, but
> changing it would require a much broader patch that covers all of the
> numeric formatting styles, not just EEEE.  A TODO is probably the way
> to go.
>
>>
>> Couldn't the following code be put inside switch clause? If not, you should
>> add a comment why the validation is outside switch.
>>
>> +   if (IS_EEEE(num) && n->key->id != NUM_E)
>> +   {
>> +       NUM_cache_remove(last_NUMCacheEntry);
>> +       ereport(ERROR,
>> +               (errcode(ERRCODE_SYNTAX_ERROR),
>> +                errmsg("\"EEEE\" must be the last pattern used")));
>> +   }
>> +
>>    switch (n->key->id)
>>    {
>>        case NUM_9:
>
> The switch is on (n->key->id), but the test you mentioned above is
> looking for any keywords *other than* the EEEE keyword, where EEEE has
> previously been parsed.
>
> So if you put the test inside the switch, it would need to appear in
> every single branch of the switch except for the NUM_E one.  I'm
> confused about why you think this needs a comment.  Perhaps I
> misunderstood you?
>
>>
>> Oracle has a diferent overflow limit [1] but I think we could stay with the
>> PostgreSQL one. But the #.#### is not the intended behavior. IIRC you're
>> limited to 99 exponent.
>>
>> SQL> SELECT to_char(1.234567E+308, '9.999EEEE');
>> SELECT to_char(1.234567E+308, '9.999EEEE')
>>
>> ERROR at line 1:
>> ORA-01426: numeric overflow
>>
>> euler=# SELECT to_char(1.234567E+308, '9.999EEEE');
>>  to_char
>> -----------
>>  #.#######
>> (1 row)
>
> I don't see any problem with extending this to allow up to 3 exponent
> digits ... Pavel, any comment?
>

I am not sure - this function should be used in reports witl fixed
line's width. And I am thinking, so it's should be problem - I prefer
showing some #.### chars. It's clean signal, so some is wrong, but it
doesn't break generating long run reports (like exception in Oracle)
and doesn't broke formating like 3 exponent digits.

Pavel

>>
>> The problem is in numeric_to_char() and float8_to_char(). You could fix it
>> with the following code. Besides that I think you should comment why '5' or
>> '6' in the other *_to_char() functions.
>>
>> +       /* 6 means '.' (decimal point), 'E', '+', and 3 exponent digits */
>
> Agreed about the comment; I'll add it.
>
> Cheers,
> BJ
>


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> Filling unused characters in the string with "#" may be strange, but
> changing it would require a much broader patch that covers all of the
> numeric formatting styles, not just EEEE.  A TODO is probably the way
> to go.
> 
That was my suggestion: a TODO.

> So if you put the test inside the switch, it would need to appear in
> every single branch of the switch except for the NUM_E one.  I'm
> confused about why you think this needs a comment.  Perhaps I
> misunderstood you?
> 
Yes, I know you need to modify every 'case' clause to test if EEEE was
previously used (that was one suggestion) but I said if you don't want to go
that way, add a comment explaining why you're using that 'if' above the
'switch' and not inside it.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/7/30 Pavel Stehule <pavel.stehule@gmail.com>:
> 2009/7/29 Brendan Jurd <direvus@gmail.com>:
>> I don't see any problem with extending this to allow up to 3 exponent
>> digits ... Pavel, any comment?
>
> I am not sure - this function should be used in reports witl fixed
> line's width. And I am thinking, so it's should be problem - I prefer
> showing some #.### chars. It's clean signal, so some is wrong, but it
> doesn't break generating long run reports (like exception in Oracle)
> and doesn't broke formating like 3 exponent digits.

Hmm.  For what it's worth, I think Pavel makes a good point about the
number of exponent digits -- a large chunk of the use case for numeric
formatting would be fixed-width reporting.

Limiting to two exponent digits also has the nice property that the
output always matches the length of the format pattern:

9.99EEEE
1.23E+02

I don't know whether being able to represent 3-digit exponents
outweighs the value of reliable fixed-width output.  Would anyone else
care to throw in their opinion?  However we end up handling it, we
will probably need to flesh out the docs regarding this.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/7/30 Euler Taveira de Oliveira <euler@timbira.com>:
>> So if you put the test inside the switch, it would need to appear in
>> every single branch of the switch except for the NUM_E one.  I'm
>> confused about why you think this needs a comment.  Perhaps I
>> misunderstood you?
>>
> Yes, I know you need to modify every 'case' clause to test if EEEE was
> previously used (that was one suggestion) but I said if you don't want to go
> that way, add a comment explaining why you're using that 'if' above the
> 'switch' and not inside it.
>

I think we've pretty much reached an impasse on this one.

I can't imagine anyone reading the code getting confused about this,
and don't know how to go about writing a comment explaining something
that is intuitively obvious to me.  I don't understand what aspect of
it requires explanation.  The test is not in the switch because it
doesn't belong there.

Perhaps someone else could weigh in and help us to resolve this?

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> Hmm.  For what it's worth, I think Pavel makes a good point about the
> number of exponent digits -- a large chunk of the use case for numeric
> formatting would be fixed-width reporting.
> 
But it doesn't cover all numbers in the interval. And in this case, all
numbers can be easily represented.

> Limiting to two exponent digits also has the nice property that the
> output always matches the length of the format pattern:
> 
> 9.99EEEE
> 1.23E+02
> 
I don't think neglecting to represent a valid number is a "nice property".
What does the length of format pattern have to do with output format?


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> I can't imagine anyone reading the code getting confused about this,
> and don't know how to go about writing a comment explaining something
> that is intuitively obvious to me.  I don't understand what aspect of
> it requires explanation.  The test is not in the switch because it
> doesn't belong there.
> 
It was just a suggestion. If you think it is self-explanatory, so it is. But I
(as the first time reading that piece of code) take some time to figure out
why that test is outside the switch.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Robert Haas
Date:
On Thu, Jul 30, 2009 at 12:17 PM, Euler Taveira de
Oliveira<euler@timbira.com> wrote:
> Brendan Jurd escreveu:
>> I can't imagine anyone reading the code getting confused about this,
>> and don't know how to go about writing a comment explaining something
>> that is intuitively obvious to me.  I don't understand what aspect of
>> it requires explanation.  The test is not in the switch because it
>> doesn't belong there.
>>
> It was just a suggestion. If you think it is self-explanatory, so it is. But I
> (as the first time reading that piece of code) take some time to figure out
> why that test is outside the switch.

I don't think we should worry about this particular issue too much.
If the biggest problem this patch has is whether or not this deserves
a comment, we're doing well; the committer can (and doubtless will)
adjust that as they may see fit.

...Robert


Re: WIP: to_char, support for EEEE format

From
Robert Haas
Date:
On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
> 2009/7/30 Pavel Stehule <pavel.stehule@gmail.com>:
>> 2009/7/29 Brendan Jurd <direvus@gmail.com>:
>>> I don't see any problem with extending this to allow up to 3 exponent
>>> digits ... Pavel, any comment?
>>
>> I am not sure - this function should be used in reports witl fixed
>> line's width. And I am thinking, so it's should be problem - I prefer
>> showing some #.### chars. It's clean signal, so some is wrong, but it
>> doesn't break generating long run reports (like exception in Oracle)
>> and doesn't broke formating like 3 exponent digits.
>
> Hmm.  For what it's worth, I think Pavel makes a good point about the
> number of exponent digits -- a large chunk of the use case for numeric
> formatting would be fixed-width reporting.
>
> Limiting to two exponent digits also has the nice property that the
> output always matches the length of the format pattern:
>
> 9.99EEEE
> 1.23E+02
>
> I don't know whether being able to represent 3-digit exponents
> outweighs the value of reliable fixed-width output.  Would anyone else
> care to throw in their opinion?  However we end up handling it, we
> will probably need to flesh out the docs regarding this.

Well, what if my whole database is full of numbers with three and four
digit exponents?  Do I have an out, or am I just hosed?

Apologies if this is a stupid question, I haven't read this patch.

...Robert


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/7/30 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
>> 2009/7/30 Pavel Stehule <pavel.stehule@gmail.com>:
>>> 2009/7/29 Brendan Jurd <direvus@gmail.com>:
>>>> I don't see any problem with extending this to allow up to 3 exponent
>>>> digits ... Pavel, any comment?
>>>
>>> I am not sure - this function should be used in reports witl fixed
>>> line's width. And I am thinking, so it's should be problem - I prefer
>>> showing some #.### chars. It's clean signal, so some is wrong, but it
>>> doesn't break generating long run reports (like exception in Oracle)
>>> and doesn't broke formating like 3 exponent digits.
>>
>> Hmm.  For what it's worth, I think Pavel makes a good point about the
>> number of exponent digits -- a large chunk of the use case for numeric
>> formatting would be fixed-width reporting.
>>
>> Limiting to two exponent digits also has the nice property that the
>> output always matches the length of the format pattern:
>>
>> 9.99EEEE
>> 1.23E+02
>>
>> I don't know whether being able to represent 3-digit exponents
>> outweighs the value of reliable fixed-width output.  Would anyone else
>> care to throw in their opinion?  However we end up handling it, we
>> will probably need to flesh out the docs regarding this.
>
> Well, what if my whole database is full of numbers with three and four
> digit exponents?  Do I have an out, or am I just hosed?
>

Maybe we should to support some modificator like Large EEEE - LEEEE or EEEEE

?? that it use 3-digit exponents

Pavel

> Apologies if this is a stupid question, I haven't read this patch.
>
> ...Robert
>


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/7/31 Euler Taveira de Oliveira <euler@timbira.com>:
> Brendan Jurd escreveu:
>> Limiting to two exponent digits also has the nice property that the
>> output always matches the length of the format pattern:
>>
>> 9.99EEEE
>> 1.23E+02
>>
> I don't think neglecting to represent a valid number is a "nice property".
> What does the length of format pattern have to do with output format?

Most of the format patterns in to_char() are chosen to match the
length of their expected output.  The output of "DD" is two
characters, "Mon" is three and so on.

That's why the scientific notation pattern is "EEEE" and not just "E".

There're certainly exceptions though.  "SG" is actually output as one
character, and "YYYY" expands to however many digits you happen to
have in your year value.

In fact "YYYY" is probably our closest analogy in to_char() to the
number of digits in the exponent.  In almost all cases, it will come
out as four characters but for extraordinarily large values it will
occupy more characters as necessary.

This breaks fixed-width, but is probably a better outcome than failure
to represent the value.

If we apply the same logic to EEEE then it should expand to admit
whatever exponent was given to it.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2009/7/30 Robert Haas <robertmhaas@gmail.com>:
>> On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
>>> Hmm. For what it's worth, I think Pavel makes a good point about the
>>> number of exponent digits -- a large chunk of the use case for numeric
>>> formatting would be fixed-width reporting.

+1.  If you aren't trying to get the format exactly so, it's not clear
why you're bothering with to_char() at all.

> Maybe we should to support some modificator like Large EEEE - LEEEE or EEEEE

Five (or more?) E's seems like a natural extension to me.  However, that
still leaves us with the question of what to do when the exponent
doesn't fit in however many digits we'd like to print.  Seems like the
options are* print #'s* force the output wider* throw an error
None of these are very nice, but the first two could cause problems that
you don't find out until it's too late to fix.  What about throwing an
error?

Sorry if this was already covered in the thread, but: anybody know what
Oracle does for this?
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Robert Haas
Date:
On Thu, Jul 30, 2009 at 12:46 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2009/7/30 Robert Haas <robertmhaas@gmail.com>:
>>> On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
>>>> Hmm. For what it's worth, I think Pavel makes a good point about the
>>>> number of exponent digits -- a large chunk of the use case for numeric
>>>> formatting would be fixed-width reporting.
>
> +1.  If you aren't trying to get the format exactly so, it's not clear
> why you're bothering with to_char() at all.
>
>> Maybe we should to support some modificator like Large EEEE - LEEEE or EEEEE
>
> Five (or more?) E's seems like a natural extension to me.  However, that
> still leaves us with the question of what to do when the exponent
> doesn't fit in however many digits we'd like to print.  Seems like the
> options are

The tricky thing here is that there are two independent parameters
here - the MINIMUM number of exponent digits you want (which might be
1, 2, 3, or more), and the MAXIMUM number of exponent digits you want
(which might be the same as minimum, or a larger number, or infinity).

...Robert


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> 2009/7/31 Euler Taveira de Oliveira <euler@timbira.com>:
>> Brendan Jurd escreveu:
>>> Limiting to two exponent digits also has the nice property that the
>>> output always matches the length of the format pattern:
>>>
>>> 9.99EEEE
>>> 1.23E+02
>>>
>> I don't think neglecting to represent a valid number is a "nice property".
>> What does the length of format pattern have to do with output format?
> 
> Most of the format patterns in to_char() are chosen to match the
> length of their expected output.  The output of "DD" is two
> characters, "Mon" is three and so on.
> 
Brendan, the main point of to_*() functions is Oracle-compatibility. So let's
just do it. No matter it seems bizarre (like 999.99EEEE).

> That's why the scientific notation pattern is "EEEE" and not just "E".
> 
I don't know. Ask Oracle. ;) We don't need to bother with that. Let's just be
compatible.


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/7/30 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2009/7/30 Robert Haas <robertmhaas@gmail.com>:
>>> On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
>>>> Hmm. For what it's worth, I think Pavel makes a good point about the
>>>> number of exponent digits -- a large chunk of the use case for numeric
>>>> formatting would be fixed-width reporting.
>
> +1.  If you aren't trying to get the format exactly so, it's not clear
> why you're bothering with to_char() at all.
>
>> Maybe we should to support some modificator like Large EEEE - LEEEE or EEEEE
>
> Five (or more?) E's seems like a natural extension to me.  However, that
> still leaves us with the question of what to do when the exponent
> doesn't fit in however many digits we'd like to print.  Seems like the
> options are
>        * print #'s
>        * force the output wider
>        * throw an error
> None of these are very nice, but the first two could cause problems that
> you don't find out until it's too late to fix.  What about throwing an
> error?

I thing, so Oracle raise error. But I don't thing, so it is necessary
repeat all Oracle the behave - mainly when is maybe not too much
practical.

* print #s, and force the output wider has one disadvantage - it
cannot put clean signal about data problem in development time, maybe
we should to add raise warning.

* throw an error should to break "bad" written application in
production, when is too late too. So anybody should have not complete
test data set and there are a problem.

I prefer print # with raising an warning.

Pavel


>
> Sorry if this was already covered in the thread, but: anybody know what
> Oracle does for this?
>
>                        regards, tom lane
>


Re: WIP: to_char, support for EEEE format

From
Robert Haas
Date:
On Thu, Jul 30, 2009 at 1:18 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
> 2009/7/30 Tom Lane <tgl@sss.pgh.pa.us>:
>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>> 2009/7/30 Robert Haas <robertmhaas@gmail.com>:
>>>> On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
>>>>> Hmm. For what it's worth, I think Pavel makes a good point about the
>>>>> number of exponent digits -- a large chunk of the use case for numeric
>>>>> formatting would be fixed-width reporting.
>>
>> +1.  If you aren't trying to get the format exactly so, it's not clear
>> why you're bothering with to_char() at all.
>>
>>> Maybe we should to support some modificator like Large EEEE - LEEEE or EEEEE
>>
>> Five (or more?) E's seems like a natural extension to me.  However, that
>> still leaves us with the question of what to do when the exponent
>> doesn't fit in however many digits we'd like to print.  Seems like the
>> options are
>>        * print #'s
>>        * force the output wider
>>        * throw an error
>> None of these are very nice, but the first two could cause problems that
>> you don't find out until it's too late to fix.  What about throwing an
>> error?
>
> I thing, so Oracle raise error. But I don't thing, so it is necessary
> repeat all Oracle the behave - mainly when is maybe not too much
> practical.
>
> * print #s, and force the output wider has one disadvantage - it
> cannot put clean signal about data problem in development time, maybe
> we should to add raise warning.
>
> * throw an error should to break "bad" written application in
> production, when is too late too. So anybody should have not complete
> test data set and there are a problem.
>
> I prefer print # with raising an warning.

It seems like the discussion here has kind of died.  We need to settle
on an approach and get a final patch soon, or else defer this until
next CommitFest.

...Robert


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/8/2 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Jul 30, 2009 at 1:18 PM, Pavel Stehule<pavel.stehule@gmail.com> wrote:
>> 2009/7/30 Tom Lane <tgl@sss.pgh.pa.us>:
>>> Pavel Stehule <pavel.stehule@gmail.com> writes:
>>>> 2009/7/30 Robert Haas <robertmhaas@gmail.com>:
>>>>> On Thu, Jul 30, 2009 at 10:35 AM, Brendan Jurd<direvus@gmail.com> wrote:
>>>>>> Hmm. For what it's worth, I think Pavel makes a good point about the
>>>>>> number of exponent digits -- a large chunk of the use case for numeric
>>>>>> formatting would be fixed-width reporting.
>>>
>>> +1.  If you aren't trying to get the format exactly so, it's not clear
>>> why you're bothering with to_char() at all.
>>>
>>>> Maybe we should to support some modificator like Large EEEE - LEEEE or EEEEE
>>>
>>> Five (or more?) E's seems like a natural extension to me.  However, that
>>> still leaves us with the question of what to do when the exponent
>>> doesn't fit in however many digits we'd like to print.  Seems like the
>>> options are
>>>        * print #'s
>>>        * force the output wider
>>>        * throw an error
>>> None of these are very nice, but the first two could cause problems that
>>> you don't find out until it's too late to fix.  What about throwing an
>>> error?
>>
>> I thing, so Oracle raise error. But I don't thing, so it is necessary
>> repeat all Oracle the behave - mainly when is maybe not too much
>> practical.
>>
>> * print #s, and force the output wider has one disadvantage - it
>> cannot put clean signal about data problem in development time, maybe
>> we should to add raise warning.
>>
>> * throw an error should to break "bad" written application in
>> production, when is too late too. So anybody should have not complete
>> test data set and there are a problem.
>>
>> I prefer print # with raising an warning.
>
> It seems like the discussion here has kind of died.  We need to settle
> on an approach and get a final patch soon, or else defer this until
> next CommitFest.

Tom, please, can you write your opinion on my last proposal - print
### with raise warning.

regards
Pavel
>
> ...Robert
>


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Tom, please, can you write your opinion on my last proposal - print
> ### with raise warning.

The idea of printing a warning seems completely horrid to me.  From a
logical point of view, either we think it's an error or we don't.  From
a practical point of view, warnings usually accomplish little except to
bloat log files that a human might or might not ever look at.

The real bottom line for to_char issues is almost always that we should
do what Oracle does.  Has anyone checked this behavior on Oracle?
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Tom Lane escreveu:
> The real bottom line for to_char issues is almost always that we should
> do what Oracle does.  Has anyone checked this behavior on Oracle?
> 
That was my point too. See [1].

[1] http://archives.postgresql.org/pgsql-hackers/2009-07/msg01870.php


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/8/2 Euler Taveira de Oliveira <euler@timbira.com>:
> Tom Lane escreveu:
>> The real bottom line for to_char issues is almost always that we should
>> do what Oracle does.  Has anyone checked this behavior on Oracle?
>>
> That was my point too. See [1].
>
> [1] http://archives.postgresql.org/pgsql-hackers/2009-07/msg01870.php
>

so, Brendan, please, can you adjust patch to raise exception like Oracle?

Thank You
Pavel
>
> --
>  Euler Taveira de Oliveira
>  http://www.timbira.com/
>


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/3 Pavel Stehule <pavel.stehule@gmail.com>:
> 2009/8/2 Euler Taveira de Oliveira <euler@timbira.com>:
>> Tom Lane escreveu:
>>> The real bottom line for to_char issues is almost always that we should
>>> do what Oracle does.  Has anyone checked this behavior on Oracle?
>>>
>> That was my point too. See [1].
>>
>> [1] http://archives.postgresql.org/pgsql-hackers/2009-07/msg01870.php
>>
>
> so, Brendan, please, can you adjust patch to raise exception like Oracle?
>

Well, the examples Euler posted in the linked message above were using
E+308.  If I'm reading the Oracle docs correctly, that would have
triggered Oracle's data type overflow error before even getting to
to_char(); Oracle's NUMBER type only supports up to E+126.  So we
still don't really know how Oracle handles a (legal) value with too
many exponent digits for EEEE.

Euler, could you post results for a number which fits within Oracle's
data type but has three exponent digits (like 1E+100)?

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Euler Taveira de Oliveira
Date:
Brendan Jurd escreveu:
> Well, the examples Euler posted in the linked message above were using
> E+308.  If I'm reading the Oracle docs correctly, that would have
> triggered Oracle's data type overflow error before even getting to
> to_char(); Oracle's NUMBER type only supports up to E+126.  So we
> still don't really know how Oracle handles a (legal) value with too
> many exponent digits for EEEE.
> 
As I said in a prior e-mail, Oracle has a diferent overflow limit (-84 to 127).
In PostgreSQL, the numeric datatype can have up to 1000 digits (ie 1e+999) and
the double precision datatype can have up to 309 digits (ie 1e-307 or 1e+308).
We should support up to 3 exponent digits so all of our native datatypes are
covered by the to_char() function.

> Euler, could you post results for a number which fits within Oracle's
> data type but has three exponent digits (like 1E+100)?
> 
I don't access to an Oracle Server now but it works fine up to the 127 limit.
And differently to what Pavel proposed, the number of E's is not related to
the number of characters (at least not anymore). So I would like to see the
EEEE being used if we have 2 or 3 exponent digits (that is the same behavior
Oracle has).


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Euler Taveira de Oliveira <euler@timbira.com> writes:
> As I said in a prior e-mail, Oracle has a diferent overflow limit (-84 to 127).
> In PostgreSQL, the numeric datatype can have up to 1000 digits (ie 1e+999) and
> the double precision datatype can have up to 309 digits (ie 1e-307 or 1e+308).
> We should support up to 3 exponent digits so all of our native datatypes are
> covered by the to_char() function.

Uh, no, we had better support more.  The actual limit of the current
numeric format is 1e+131072.

As long as we consider that EEEE should emit as many exponent digits
as needed, this isn't particularly critical.  But it would be if we
try to specify an exact number of output digits.
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/3 Euler Taveira de Oliveira <euler@timbira.com>:
> Brendan Jurd escreveu:
>> Euler, could you post results for a number which fits within Oracle's
>> data type but has three exponent digits (like 1E+100)?
>>
> I don't access to an Oracle Server now but it works fine up to the 127 limit.
> And differently to what Pavel proposed, the number of E's is not related to
> the number of characters (at least not anymore). So I would like to see the
> EEEE being used if we have 2 or 3 exponent digits (that is the same behavior
> Oracle has).
>

Okay, so Oracle just forces the output wider to accomodate the
exponent (i.e., you can't rely on it being fixed-width).

I can adjust the patch to imitate this behaviour; should be able to
post a new revision within 24 hours.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/3 Brendan Jurd <direvus@gmail.com>:
> Okay, so Oracle just forces the output wider to accomodate the
> exponent (i.e., you can't rely on it being fixed-width).
>
> I can adjust the patch to imitate this behaviour; should be able to
> post a new revision within 24 hours.
>

Please find attached version 5 of the patch, which allows for
exponents with any number of digits (up to a total string length of
MAXDOUBLEWIDTH).

Cheers,
BJ

Attachment

Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/3 Tom Lane <tgl@sss.pgh.pa.us>:
> Euler Taveira de Oliveira <euler@timbira.com> writes:
>> As I said in a prior e-mail, Oracle has a diferent overflow limit (-84 to 127).
>> In PostgreSQL, the numeric datatype can have up to 1000 digits (ie 1e+999) and
>> the double precision datatype can have up to 309 digits (ie 1e-307 or 1e+308).
>> We should support up to 3 exponent digits so all of our native datatypes are
>> covered by the to_char() function.
>
> Uh, no, we had better support more.  The actual limit of the current
> numeric format is 1e+131072.
>
> As long as we consider that EEEE should emit as many exponent digits
> as needed, this isn't particularly critical.  But it would be if we
> try to specify an exact number of output digits.
>

Well, I tried this and as it turns out the patch casts the value to a
float8 in order to pass it on to snprintf for sci-notation formatting.See the following chunk in numeric_to_char():
else if (IS_EEEE(&Num)){    float8 val;
    val = DatumGetFloat8(DirectFunctionCall1(numeric_float8,
NumericGetDatum(value)));
    numstr = orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);    len = snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%.*e",
Num.post,val);} 

This leads to the following behaviour:

=# select to_char(1e+1000::numeric, '9.99EEEE');
ERROR:
"10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
is out of range for type double precision

If we want to support the full range of numeric values with EEEE, I
guess we would need to write our own implementation of scientific
notation rather than depend on sprintf()'s?

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes:
> Well, I tried this and as it turns out the patch casts the value to a
> float8 in order to pass it on to snprintf for sci-notation formatting.

Well, that's pretty dumb.  Quite aside from the range problem, that
would mean that you lose everything past the sixteenth or so digit.
I think that's sufficient grounds for bouncing the patch back for
rework right there.

What I'd consider instead is calling numeric_out and then working
with the result of that.  It would always be f-format, so you'd
have to do your own conversion to e-format, but you could do it
without any risk of precision or range loss.
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:
> Brendan Jurd <direvus@gmail.com> writes:
>> Well, I tried this and as it turns out the patch casts the value to a
>> float8 in order to pass it on to snprintf for sci-notation formatting.
>
> Well, that's pretty dumb.  Quite aside from the range problem, that
> would mean that you lose everything past the sixteenth or so digit.
> I think that's sufficient grounds for bouncing the patch back for
> rework right there.
>

I agree.

> What I'd consider instead is calling numeric_out and then working
> with the result of that.  It would always be f-format, so you'd
> have to do your own conversion to e-format, but you could do it
> without any risk of precision or range loss.
>

Yeah, I figured as much.  I'll see what I can do about reworking the
numeric case.  I should be able to post a new revision in the next day
or so, but I certainly won't cry foul if this gets punted to
September.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes:
> 2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:
>> What I'd consider instead is calling numeric_out and then working
>> with the result of that. �It would always be f-format, so you'd
>> have to do your own conversion to e-format, but you could do it
>> without any risk of precision or range loss.

> Yeah, I figured as much.  I'll see what I can do about reworking the
> numeric case.  I should be able to post a new revision in the next day
> or so, but I certainly won't cry foul if this gets punted to
> September.

BTW, there's no rule saying you have to fix that strictly within
to_char().  It might make more sense to have numeric.c export a
function that is like numeric_out but produces e-style output.
Your choice as the patch writer, but keep it in mind ...
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/4 Tom Lane <tgl@sss.pgh.pa.us>:
> BTW, there's no rule saying you have to fix that strictly within
> to_char().  It might make more sense to have numeric.c export a
> function that is like numeric_out but produces e-style output.
> Your choice as the patch writer, but keep it in mind ...
>

Ah, thanks for the suggestion.  I think you're right, numeric.c would
be a nice place for this code.  Going directly from the guts of a
NumericVar to a scientific notation seems more elegant than performing
cut-and-paste style surgery on the string from numeric_out().

It does require a bit more effort and headspace than I was expecting.
I'm still keen to do it, but my "next day or so" timeframe may not be
so realistic.  It's looking more like "by the end of the week".

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/3 Tom Lane <tgl@sss.pgh.pa.us>:
> Uh, no, we had better support more.  The actual limit of the current
> numeric format is 1e+131072.
>

Given your comment above I'm thinking it reasonable to use an int32 to
store the exponent -- will that be safe?

That would allow for a maximum of 10 exponent digits.  As an aside, I
note that int4out() hardcodes the maximum number of digits rather than
exposing a constant (c.f. MAXINT8LEN in int8.c).  I'm considering
adding MAXINT2LEN and MAXINT4LEN to int.c in passing.  Excessive
tinkering, or worthy improvement?

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes:
> 2009/8/3 Tom Lane <tgl@sss.pgh.pa.us>:
>> Uh, no, we had better support more. �The actual limit of the current
>> numeric format is 1e+131072.

> Given your comment above I'm thinking it reasonable to use an int32 to
> store the exponent -- will that be safe?

Seems reasonable to me.

> That would allow for a maximum of 10 exponent digits.  As an aside, I
> note that int4out() hardcodes the maximum number of digits rather than
> exposing a constant (c.f. MAXINT8LEN in int8.c).  I'm considering
> adding MAXINT2LEN and MAXINT4LEN to int.c in passing.  Excessive
> tinkering, or worthy improvement?

Don't really care.  short and int are the same sizes on all platforms of
interest, and are likely to remain so --- if they don't, we'll have way
more places to fix than this one.  INT8 has historically been more
platform-dependent.
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/9 Tom Lane <tgl@sss.pgh.pa.us>:
> Brendan Jurd <direvus@gmail.com> writes:
>> That would allow for a maximum of 10 exponent digits.  As an aside, I
>> note that int4out() hardcodes the maximum number of digits rather than
>> exposing a constant (c.f. MAXINT8LEN in int8.c).  I'm considering
>> adding MAXINT2LEN and MAXINT4LEN to int.c in passing.  Excessive
>> tinkering, or worthy improvement?
>
> Don't really care.  short and int are the same sizes on all platforms of
> interest, and are likely to remain so --- if they don't, we'll have way
> more places to fix than this one.  INT8 has historically been more
> platform-dependent.
>

"Excessive tinkering" it is, then.  I went ahead with hardcoding the
expected 10 digits.

Here's version 6 of the EEEE patch, now with an all-new implementation
of (normalised) scientific notation in numeric.c, via the functions
numeric_out_sci() and get_str_from_var_sci().  So EEEE should now be
able to represent the full gamut of the numeric type.

regression=# select to_char(-1e-1000, '9.99EEEE');
   to_char
-------------
 -1.00e-1000
(1 row)

I also noticed that EEEE wasn't outputting a leading space for
positive numbers like the rest of to_char does.  This meant that
positive and negative numbers weren't aligned.  The logic to do this
for ordinary decimal representations is tied up in NUM_processor() and
NUM_numpart_to_char(), so unfortunately I wasn't able to reuse it.
Instead I just frobbed the code in the *_to_char() functions to make
this work.

Noting that Robert has dropped the patch from the July CF (and having
no objection to that) I'm going to submit this for September.

Cheers,
BJ

Attachment

Re: WIP: to_char, support for EEEE format

From
Alvaro Herrera
Date:
Brendan Jurd escribió:

> Here's version 6 of the EEEE patch, now with an all-new implementation
> of (normalised) scientific notation in numeric.c, via the functions
> numeric_out_sci() and get_str_from_var_sci().  So EEEE should now be
> able to represent the full gamut of the numeric type.

I noticed an ugly pattern in NUMDesc_prepare calling a cleanup function
before every ereport(ERROR).  I think it's cleaner to replace that with
a PG_TRY block; see attached.

I didn't go over the patch in much more detail.  (But the
numeric_out_sci business got me thinking.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment

Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/9 Alvaro Herrera <alvherre@commandprompt.com>:
> Brendan Jurd escribió:
>
>> Here's version 6 of the EEEE patch, now with an all-new implementation
>> of (normalised) scientific notation in numeric.c, via the functions
>> numeric_out_sci() and get_str_from_var_sci().  So EEEE should now be
>> able to represent the full gamut of the numeric type.
>
> I noticed an ugly pattern in NUMDesc_prepare calling a cleanup function
> before every ereport(ERROR).  I think it's cleaner to replace that with
> a PG_TRY block; see attached.

Looks nice -- although doesn't have anything to do with the EEEE patch
so perhaps deserves its own thread?

>
> I didn't go over the patch in much more detail.  (But the
> numeric_out_sci business got me thinking.)

Got you thinking about what?  I'd welcome any comments you have.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Alvaro Herrera
Date:
Brendan Jurd escribió:
> 2009/8/9 Alvaro Herrera <alvherre@commandprompt.com>:

> > I noticed an ugly pattern in NUMDesc_prepare calling a cleanup function
> > before every ereport(ERROR).  I think it's cleaner to replace that with
> > a PG_TRY block; see attached.
> 
> Looks nice -- although doesn't have anything to do with the EEEE patch
> so perhaps deserves its own thread?

Yes, it just popped up while skimming the patch.

> > I didn't go over the patch in much more detail.  (But the
> > numeric_out_sci business got me thinking.)
> 
> Got you thinking about what?  I'd welcome any comments you have.

I wondered if it should just return char *.  If we want to have this
functionality as its own fmgr-blessed function, shouldn't it return
text instead of cstring?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Got you thinking about what?  I'd welcome any comments you have.

> I wondered if it should just return char *.  If we want to have this
> functionality as its own fmgr-blessed function, shouldn't it return
> text instead of cstring?

If we expose it at fmgr level it should definitely not return cstring.
However, I wasn't foreseeing doing that --- does the submitted patch
expose it?
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/11 Tom Lane <tgl@sss.pgh.pa.us>:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> I wondered if it should just return char *.  If we want to have this
>> functionality as its own fmgr-blessed function, shouldn't it return
>> text instead of cstring?
>
> If we expose it at fmgr level it should definitely not return cstring.
> However, I wasn't foreseeing doing that --- does the submitted patch
> expose it?
>

Sorry, I'm a little hazy on the terminology here.  If by "expose it at
fmgr level" you mean did I add it to pg_proc, then no, I didn't.

The function is declared in builtins.h as "extern Datum
numeric_out_sci(PG_FUNCTION_ARGS);", and called from formatting.c
using the  DirectFunctionCall arrangement.

numeric_out_sci() returns using PG_RETURN_CSTRING, same as numeric_out does.

If this is the wrong way to expose the function, please let me know
and I'll happily fix it.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes:
> 2009/8/11 Tom Lane <tgl@sss.pgh.pa.us>:
>> If we expose it at fmgr level it should definitely not return cstring.
>> However, I wasn't foreseeing doing that --- does the submitted patch
>> expose it?

> Sorry, I'm a little hazy on the terminology here.  If by "expose it at
> fmgr level" you mean did I add it to pg_proc, then no, I didn't.

OK.

> The function is declared in builtins.h as "extern Datum
> numeric_out_sci(PG_FUNCTION_ARGS);", and called from formatting.c
> using the  DirectFunctionCall arrangement.

If it's not meant to be in pg_proc, I wouldn't bother with using the V1
call protocol for it.  "extern char *numeric_out_sci(Numeric x)" would
be sufficient, and less notation on both caller and callee sides.
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/11 Tom Lane <tgl@sss.pgh.pa.us>:
> If it's not meant to be in pg_proc, I wouldn't bother with using the V1
> call protocol for it.  "extern char *numeric_out_sci(Numeric x)" would
> be sufficient, and less notation on both caller and callee sides.
>

Thanks Tom.  I have removed the V1 stuff as you suggest, and placed
the declaration in numeric.h.

Here's version 7.

Cheers,
BJ

Attachment

Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes:
> Thanks Tom.  I have removed the V1 stuff as you suggest, and placed
> the declaration in numeric.h.

> Here's version 7.

Working through this now, and I noticed that the example added to the
manual seems to be wrong:
       <entry><literal>to_char(0.000485, '9.99EEEE')</literal></entry>       <entry><literal>'
4.850e-04'</literal></entry>

With 9.99 as the pattern, I'd expect (and indeed I get) 4.85e-04
not 4.850e-04.  This is correct behavior, no?

Also, I'm wondering what should happen with

regression=# select to_char(0.000485, '99.99EEEE'); to_char  
----------- 4.85e-04
(1 row)

Doesn't seem quite right.  Should we throw error if the number of 9's
before the decimal point isn't 1?
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Alvaro Herrera
Date:
Tom Lane escribió:

> Also, I'm wondering what should happen with
> 
> regression=# select to_char(0.000485, '99.99EEEE');
>   to_char  
> -----------
>   4.85e-04
> (1 row)
> 
> Doesn't seem quite right.  Should we throw error if the number of 9's
> before the decimal point isn't 1?

No, see
http://archives.postgresql.org/message-id/4A68FAE4.50206@timbira.com

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> Doesn't seem quite right.  Should we throw error if the number of 9's
>> before the decimal point isn't 1?

> No, see
> http://archives.postgresql.org/message-id/4A68FAE4.50206@timbira.com

Ah, nothing like being bug-compatible with a bad implementation.
But I agree, if Oracle ignores the number of 9's there then we
should too.

BTW, this patch adds more NUM_cache_remove() calls.  I'm planning
to commit it that way, unless you're just about to commit your PG_TRY
change?  I agree with doing that, but figured it should be a separate
commit.
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Alvaro Herrera
Date:
Tom Lane escribió:

> BTW, this patch adds more NUM_cache_remove() calls.  I'm planning
> to commit it that way, unless you're just about to commit your PG_TRY
> change?  I agree with doing that, but figured it should be a separate
> commit.

No, go ahead, I will commit that separately.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/11 Tom Lane <tgl@sss.pgh.pa.us>:
> Working through this now, and I noticed that the example added to the
> manual seems to be wrong:
>
>        <entry><literal>to_char(0.000485, '9.99EEEE')</literal></entry>
>        <entry><literal>' 4.850e-04'</literal></entry>
>
> With 9.99 as the pattern, I'd expect (and indeed I get) 4.85e-04
> not 4.850e-04.  This is correct behavior, no?

Correct.  I apologise for the oversight.

The example output should lose the trailing zero, or else the example
query needs an extra '9' after the decimal point.  I don't think it
makes much difference which.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Tom Lane
Date:
Brendan Jurd <direvus@gmail.com> writes:
> Here's version 7.

Applied with a couple of corrections: the numeric case wasn't dealing
with NaNs in the same way as the float cases, and the int8 case was
converting to float8 which would lose precision.  I made it go through
numeric instead, which is pretty expensive but I doubt this is worth
expending extra code on.
        regards, tom lane


Re: WIP: to_char, support for EEEE format

From
Pavel Stehule
Date:
2009/8/10 Tom Lane <tgl@sss.pgh.pa.us>:
> Brendan Jurd <direvus@gmail.com> writes:
>> Here's version 7.
>
> Applied with a couple of corrections: the numeric case wasn't dealing
> with NaNs in the same way as the float cases, and the int8 case was
> converting to float8 which would lose precision.  I made it go through
> numeric instead, which is pretty expensive but I doubt this is worth
> expending extra code on.
>
>                        regards, tom lane
>

It's nice. I am playing with it, and now I found some potential issue.
The parser is maybe too tolerant:

postgres=# select to_char(3.14323,'9.9(aaaaaEEEE');to_char
---------- 3.1e+00
(1 row)

regards
Pavel Stehule


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/11 Pavel Stehule <pavel.stehule@gmail.com>:
> It's nice. I am playing with it, and now I found some potential issue.
> The parser is maybe too tolerant:
>
> postgres=# select to_char(3.14323,'9.9(aaaaaEEEE');
>  to_char
> ----------
>  3.1e+00
> (1 row)
>

I guess we *could* add code to throw an error where the 9's aren't
placed immediately before the E's in the format pattern, but to be
honest I can't work up any enthusiasm about it.

Cheers,
BJ


Re: WIP: to_char, support for EEEE format

From
Brendan Jurd
Date:
2009/8/11 Tom Lane <tgl@sss.pgh.pa.us>:
> Brendan Jurd <direvus@gmail.com> writes:
>> Here's version 7.
>
> Applied with a couple of corrections: the numeric case wasn't dealing
> with NaNs in the same way as the float cases,

Thanks for that.

I do think that the whole business of printing "#.#####" is highly
bogus, and I might look at getting it to throw errors instead as a
separate patch.

I've added a TODO for same.

Cheers,
BJ


TODO list, was Re: WIP: to_char, support for EEEE format

From
Bruce Momjian
Date:
Brendan Jurd wrote:
> 2009/8/11 Tom Lane <tgl@sss.pgh.pa.us>:
> > Brendan Jurd <direvus@gmail.com> writes:
> >> Here's version 7.
> >
> > Applied with a couple of corrections: the numeric case wasn't dealing
> > with NaNs in the same way as the float cases,
> 
> Thanks for that.
> 
> I do think that the whole business of printing "#.#####" is highly
> bogus, and I might look at getting it to throw errors instead as a
> separate patch.
> 
> I've added a TODO for same.

Everyone, please, when you update the TODO list, put some text in the
"Summary" box at the bottom before you commit the change so people know
what has been changed.  I usually just copy the first sentence of my new
text and add a tag at the beginning like "Added" or "Done".

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +