Thread: Zero-padding and zero-masking fixes for to_char(float)

Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

18 March 2015, 21:52:50

In September, while researching the to_char() buffer overflow bugs fixed
in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
inconsistency in how to_char() does zero-padding for float4/8 values.
Now that 9.4.1 is released and I am home for a while, I am ready to
address this.

For example, to_char(int4) properly pads with trailing zeros, e.g.

    SELECT to_char(int4 '1999999999', '9999999999999999D' || repeat('9', 1000));
    ------
    1999999999.000000000000000000000000000000...

Numeric does the same thing:

    SELECT to_char(numeric '99999999999', '9999999999999999D' || repeat('9', 1000));
    ------
    99999999999.00000000000000000000000000000...

However, float4/8 do not supply the requested zero padding:

    SELECT to_char(float4 '99999999999', '9999999999999999D' || repeat('9', 1000));
    ------
    99999997952

and

    SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));
    ------
    99999999999.0000

float4/8 are padding to the internal precision, while int4/numeric are
padding based on the requested precision.  This is inconsistent.

The first attached patch fixes this, and also zeros the "junk" digits
which exceed the precision of the underlying type:

    SELECT to_char(float4 '99999999999', '9999999999999999D' || repeat('9', 1000));
    ------
    99999900000.00000000000000000000000000000...

    SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));
    ------
    99999999999.0000000000000000000000000000....

This "junk" digit zeroing matches the Oracle behavior:

    SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
    ------
    1.1234567891234568000000000000000000000

Our output with the patch would be:

    SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
    ------
    1.1234567891234500000000000000000000000

which is pretty close.

The second patch adds regression tests for these.

I would like to apply this for 9.5 while I remember what I was doing,
but I guess now that I have written this email, I will be able to keep
it for 9.6 if people prefer.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

22 March 2015, 01:46:29

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> In September, while researching the to_char() buffer overflow bugs fixed
> in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
> inconsistency in how to_char() does zero-padding for float4/8 values. 
> Now that 9.4.1 is released and I am home for a while, I am ready to
> address this.
...
> float4/8 are padding to the internal precision, while int4/numeric are
> padding based on the requested precision.  This is inconsistent.
> 
> The first attached patch fixes this, and also zeros the "junk" digits
> which exceed the precision of the underlying type:

Patch applied.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

David Rowley

Date:

22 March 2015, 05:05:00

On 22 March 2015 at 14:46, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> In September, while researching the to_char() buffer overflow bugs fixed
> in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
> inconsistency in how to_char() does zero-padding for float4/8 values.
> Now that 9.4.1 is released and I am home for a while, I am ready to
> address this.
...
> float4/8 are padding to the internal precision, while int4/numeric are
> padding based on the requested precision. This is inconsistent.
>
> The first attached patch fixes this, and also zeros the "junk" digits
> which exceed the precision of the underlying type:

Patch applied.

This seems to have broken jacana. Looks like MSVC by default has a 3 digit exponent.

Going by this: https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems that it can quite easily be set back to 2.

I've attached a patch which seems to fix the issue.

Regards

David Rowley

Attachment

2digit_exponent.diff

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Tom Lane

Date:

22 March 2015, 05:43:08

David Rowley <dgrowleyml@gmail.com> writes:
> This seems to have broken jacana.  Looks like MSVC by default has a 3 digit
> exponent.

jacana was broken before this patch; but some other Windows critters
are now unhappy as well.

> Going by this:
> https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
> that it can quite easily be set back to 2.

> I've attached a patch which seems to fix the issue.

That seems likely to have side-effects far beyond what's appropriate.
We have gone out of our way to accommodate 3-digit exponents in other
tests.  What I want to know is why this patch created a 3-digit output
where there was none before.
        regards, tom lane

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

22 March 2015, 15:22:34

On Sun, Mar 22, 2015 at 01:42:56AM -0400, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > This seems to have broken jacana.  Looks like MSVC by default has a 3 digit
> > exponent.
> 
> jacana was broken before this patch; but some other Windows critters
> are now unhappy as well.
> 
> > Going by this:
> > https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
> > that it can quite easily be set back to 2.
> 
> > I've attached a patch which seems to fix the issue.
> 
> That seems likely to have side-effects far beyond what's appropriate.
> We have gone out of our way to accommodate 3-digit exponents in other
> tests.  What I want to know is why this patch created a 3-digit output
> where there was none before.

I was wondering the same thing too, but when I grep'ed the regression
output files looking for exponents, I found float4-exp-three-digits.out
and int8-exp-three-digits.out.  I think this means we have had this
issue before, and we are going to have to either create a
numeric-exp-three-digits.out alternate output file, move the test to one
of the existing exp-three-digits files, or remove the tests.  Sorry, I
didn't expect any problems with this patch as it was so small and
localized.

What has me more concerned is the Solaris 10 failure.  This query:
SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));

expects this:
99999999999.00000000000...

but on Solaris 10 gets this:
.000000000000000000

Yes, the nines are gone, and only this query is failing.  Oddly, this
query did not fail, though the only difference is fewer decimal digits:
SELECT to_char(float8 '99999999999', '9999999999999999D99999999');

This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Noah Misch

Date:

22 March 2015, 16:46:20

On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote:
> What has me more concerned is the Solaris 10 failure.  This query:
> 
>     SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));
> 
> expects this:
> 
>     99999999999.00000000000...
> 
> but on Solaris 10 gets this:
> 
>     .000000000000000000
> 
> Yes, the nines are gone, and only this query is failing.  Oddly, this
> query did not fail, though the only difference is fewer decimal digits:
> 
>     SELECT to_char(float8 '99999999999', '9999999999999999D99999999');
> 
> This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

Use of the "f" conversion specifier with precision greater than 512 is not
portable; I get a similar diff on AIX 7.  Until this patch, PostgreSQL would
not use arbitrarily-large precisions on that conversion specifier.  (Who would
guess, but the "e" conversion specifier is not affected.)

I recommend adding a "configure" test to use our snprintf.c replacements if
sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Noah Misch

Date:

22 March 2015, 20:41:28

When you posted this, I made a note to review it.

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> This "junk" digit zeroing matches the Oracle behavior:
> 
>     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
>     ------
>     1.1234567891234568000000000000000000000
> 
> Our output with the patch would be:
> 
>     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
>     ------
>     1.1234567891234500000000000000000000000
> 
> which is pretty close.

PostgreSQL 9.4 returns "1.12345678912346".  Your patch truncates digits past
DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG
digits.  PostgreSQL must continue to round.

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant.  Should we match Oracle in this respect while
we're breaking compatibility anyway?  I tend to think yes.

> *************** int4_to_char(PG_FUNCTION_ARGS)
> *** 5214,5221 ****
>           /* we can do it easily because float8 won't lose any precision */
>           float8        val = (float8) value;
>   
> !         orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
> !         snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val);
>   
>           /*
>            * Swap a leading positive sign for a space.
> --- 5207,5213 ----
>           /* we can do it easily because float8 won't lose any precision */
>           float8        val = (float8) value;
>   
> !         orgnum = psprintf("%+.*e", Num.post, val);

Neither the submission notes nor the commit messages mentioned it, but this
improves behavior for to_char(int4, text).  Specifically, it helps EEEE
formats with more than about 500 decimal digits:
 SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

23 March 2015, 02:50:08

On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
> On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote:
> > What has me more concerned is the Solaris 10 failure.  This query:
> > 
> >     SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));
> > 
> > expects this:
> > 
> >     99999999999.00000000000...
> > 
> > but on Solaris 10 gets this:
> > 
> >     .000000000000000000
> > 
> > Yes, the nines are gone, and only this query is failing.  Oddly, this
> > query did not fail, though the only difference is fewer decimal digits:
> > 
> >     SELECT to_char(float8 '99999999999', '9999999999999999D99999999');
> > 
> > This smells like a libc bug, e.g. OmniOS 5.11 passed the test.
> 
> Use of the "f" conversion specifier with precision greater than 512 is not
> portable; I get a similar diff on AIX 7.  Until this patch, PostgreSQL would
> not use arbitrarily-large precisions on that conversion specifier.  (Who would
> guess, but the "e" conversion specifier is not affected.)

Yeah.

> I recommend adding a "configure" test to use our snprintf.c replacements if
> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

Do we really want to go to our /port snprintf just to handle 512+
digits?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

23 March 2015, 02:53:28

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> When you posted this, I made a note to review it.
> 
> On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > This "junk" digit zeroing matches the Oracle behavior:
> > 
> >     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
> >     ------
> >     1.1234567891234568000000000000000000000
> > 
> > Our output with the patch would be:
> > 
> >     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
> >     ------
> >     1.1234567891234500000000000000000000000
> > 
> > which is pretty close.
> 
> PostgreSQL 9.4 returns "1.12345678912346".  Your patch truncates digits past
> DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG
> digits.  PostgreSQL must continue to round.

Ah, that rounding is a big problem.  I can't just round because if the
digit to be rounded up is '9', I have to set that to zero and increment
the previous digit, and that could cascade and shift the entire string
one over.  I think I have to go back to the original code that does the
weird computations to figure out how many digits are on the left of the
decimal point, then set the format string after.  I was hoping my patch
could clean that all up, but we now need snprintf to do that rounding
for us.  :-(

> These outputs show Oracle treating 17 digits as significant while PostgreSQL
> treats 15 digits as significant.  Should we match Oracle in this respect while
> we're breaking compatibility anyway?  I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

> > *************** int4_to_char(PG_FUNCTION_ARGS)
> > *** 5214,5221 ****
> >           /* we can do it easily because float8 won't lose any precision */
> >           float8        val = (float8) value;
> >   
> > !         orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
> > !         snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val);
> >   
> >           /*
> >            * Swap a leading positive sign for a space.
> > --- 5207,5213 ----
> >           /* we can do it easily because float8 won't lose any precision */
> >           float8        val = (float8) value;
> >   
> > !         orgnum = psprintf("%+.*e", Num.post, val);
> 
> Neither the submission notes nor the commit messages mentioned it, but this
> improves behavior for to_char(int4, text).  Specifically, it helps EEEE
> formats with more than about 500 decimal digits:
> 
>   SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));

Wow, that is surprising.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

23 March 2015, 03:00:52

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> > > *************** int4_to_char(PG_FUNCTION_ARGS)
> > > *** 5214,5221 ****
> > >           /* we can do it easily because float8 won't lose any precision */
> > >           float8        val = (float8) value;
> > >   
> > > !         orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
> > > !         snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val);
> > >   
> > >           /*
> > >            * Swap a leading positive sign for a space.
> > > --- 5207,5213 ----
> > >           /* we can do it easily because float8 won't lose any precision */
> > >           float8        val = (float8) value;
> > >   
> > > !         orgnum = psprintf("%+.*e", Num.post, val);
> > 
> > Neither the submission notes nor the commit messages mentioned it, but this
> > improves behavior for to_char(int4, text).  Specifically, it helps EEEE
> > formats with more than about 500 decimal digits:
> > 
> >   SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));
> 
> Wow, that is surprising.

Ah, yes.  There was a problem where were clipping off int4 EEEE output
to MAXDOUBLEWIDTH, which made no sense, so there was the fix for that as
well.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Tom Lane

Date:

23 March 2015, 04:36:38

Bruce Momjian <bruce@momjian.us> writes:
> On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
>> I recommend adding a "configure" test to use our snprintf.c replacements if
>> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

> Do we really want to go to our /port snprintf just to handle 512+
> digits?

I'd rather not go that direction (that is, to using a configure test).
It assumes that all copies of libc on a particular platform behave the
same, which seems like a bad bet to me.  I think we'd be better off to
avoid asking libc to do anything that might not work everywhere.

On the other hand, this line of thought might end up having you
reimplement in formatting.c the same logic I put into snprintf.c
recently, which seems a bit silly.
        regards, tom lane

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Noah Misch

Date:

23 March 2015, 06:49:24

On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
> >> I recommend adding a "configure" test to use our snprintf.c replacements if
> >> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.
> 
> > Do we really want to go to our /port snprintf just to handle 512+
> > digits?
> 
> I'd rather not go that direction (that is, to using a configure test).
> It assumes that all copies of libc on a particular platform behave the
> same, which seems like a bad bet to me.  I think we'd be better off to
> avoid asking libc to do anything that might not work everywhere.
> 
> On the other hand, this line of thought might end up having you
> reimplement in formatting.c the same logic I put into snprintf.c
> recently, which seems a bit silly.

We already assume that a positive PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER,
PGAC_FUNC_SNPRINTF_ARG_CONTROL and PGAC_FUNC_SNPRINTF_SIZE_T_SUPPORT will hold
forever.  While one can construct counterexamples, I don't recall much push to
change that.  This %1000f problem evades casual testing and will fade from
hacker memory.  We're liable to introduce similar bugs if everyone hacking
high level code like src/backend/utils/adt needs to remember it.

A potential compromise is to verify the configure-detected behavior during
startup_hacks().  The DBA will see clearly when the runtime libraries exhibit
a bug absent at build time.

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

23 March 2015, 15:33:31

On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:
> >> I recommend adding a "configure" test to use our snprintf.c replacements if
> >> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.
> 
> > Do we really want to go to our /port snprintf just to handle 512+
> > digits?
> 
> I'd rather not go that direction (that is, to using a configure test).
> It assumes that all copies of libc on a particular platform behave the
> same, which seems like a bad bet to me.  I think we'd be better off to
> avoid asking libc to do anything that might not work everywhere.
> 
> On the other hand, this line of thought might end up having you
> reimplement in formatting.c the same logic I put into snprintf.c
> recently, which seems a bit silly.

If we can't trust libc for 512+ precision (and I don't think a startup
check is warranted), I think we should either use our internal snprintf
for to_char() in all cases, or in cases where the precision is 512+.  Of
course, this hinges on the assumption that only to_char() cares about
512+ digits --- if not, a startup check seems a requirement.

However, even if we have a working snprintf, the bigger problem is
having to do rounding, e.g. this is the worst case:
SELECT to_char(float4 '5.5555555555555555555555555555', '9D' || repeat('9', 1000) || 'EEEE');

What we have to do here is to round to the specified precision, i.e. we
can't just pad with zeros, which was my original approach.  I am afraid
that means we have to keep the odd coding where we call snprintf with
zero decimal digits, get its length, then figure out the decimal
precision, then call snprintf again to format the string with rounding. 
Then, we need to separate the mantissa from the exponent and add the
desired number of zeros and copy it into a new longer string.

So, instead of the code being cleaner, it will be even more messy, and
will  duplicate some of what we already do in our port/snprintf.c. 
Unfortunately, there is no way to call snprintf and tell it we want a
specific number of decimal digits _and_ pad the rest with zeros.  I
think trying to do the rounding on the output string will never work
well, e.g. 9.9999 rounded to one decimal digit is 10.0.

Yuck.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Noah Misch

Date:

24 March 2015, 13:54:14

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > > This "junk" digit zeroing matches the Oracle behavior:
> > > 
> > >     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
> > >     ------
> > >     1.1234567891234568000000000000000000000
> > > 
> > > Our output with the patch would be:
> > > 
> > >     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
> > >     ------
> > >     1.1234567891234500000000000000000000000

> > These outputs show Oracle treating 17 digits as significant while PostgreSQL
> > treats 15 digits as significant.  Should we match Oracle in this respect while
> > we're breaking compatibility anyway?  I tend to think yes.
> 
> Uh, I am hesistant to adjust our precision to match Oracle as I don't
> know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

24 March 2015, 14:05:25

On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
> On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > > > This "junk" digit zeroing matches the Oracle behavior:
> > > > 
> > > >     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
> > > >     ------
> > > >     1.1234567891234568000000000000000000000
> > > > 
> > > > Our output with the patch would be:
> > > > 
> > > >     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
> > > >     ------
> > > >     1.1234567891234500000000000000000000000
> 
> > > These outputs show Oracle treating 17 digits as significant while PostgreSQL
> > > treats 15 digits as significant.  Should we match Oracle in this respect while
> > > we're breaking compatibility anyway?  I tend to think yes.
> > 
> > Uh, I am hesistant to adjust our precision to match Oracle as I don't
> > know what they are using internally.
> 
> http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
> float8 and 9 digits for float4.

OK, I am fine in using those values if you can find them as compiler
defines, but I don't see how we can grab those values from a user test
on Oracle.

There are some "invisible" float digits that don't appear in %f but can
be shown if desired --- I think we used to do that in the regression
tests, but found they added too much platform-specific randomness.  Do
we want to go in that direction?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Jeff Anton

Date:

24 March 2015, 16:26:20

The issue of significant (decimal) digits to and from floating point 
representation is a complex one.

What 'significant' means may depend upon the intent.

There are (at least) two different tests which may need to be used.

* How many digits can be stored and then accurately returned?
or
* How many decimal digits are needed to recreate a floating point value?  Or in longer form, if you have a floating
pointvalue, you may want to 

print it in decimal form and then later scan that to recreate the exact 
bit pattern from the original floating point value.  How many decimal 
digits do you need?

The first question produces a smaller number of digits then the second one!

The idea of zero padding is, IMO, a bad idea all together.  It makes 
people feel better, but it adds inaccuracy.  I've lost this 
interpretation so many times now that I only mention it for the real 
number geeks out there.

Postgresql seems to be using the first interpretation and reporting 
fewer digits.  I've noticed this with pg_dump.  That a dump and restore 
of floating point values does not produce the same floating point 
values.  To me, that is inexcusable.  Using the -Fc format, real values 
are preserved.  I have a large database of security prices.  I want 
accuracy above all.

I do not have time right now to produce the needed evidence for all 
these cases of floating point values.  If there is interest I can 
produce this in a day or so.

Jeff Anton

BTW:  This is my first posting to this list.  I should introduce myself.
I'm Jeff Anton.  I was the first Postgres project lead programmer 
working for Michael Stonebraker at U.C. Berkeley a very long time ago.
The first version was never released.  I've since worked for several db 
companies.

On 03/24/15 06:47, Noah Misch wrote:
> On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
>> On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
>>> On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
>>>> This "junk" digit zeroing matches the Oracle behavior:
>>>>
>>>>     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
>>>>     ------
>>>>     1.1234567891234568000000000000000000000
>>>>
>>>> Our output with the patch would be:
>>>>
>>>>     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
>>>>     ------
>>>>     1.1234567891234500000000000000000000000
>
>>> These outputs show Oracle treating 17 digits as significant while PostgreSQL
>>> treats 15 digits as significant.  Should we match Oracle in this respect while
>>> we're breaking compatibility anyway?  I tend to think yes.
>>
>> Uh, I am hesistant to adjust our precision to match Oracle as I don't
>> know what they are using internally.
>
> http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
> float8 and 9 digits for float4.
>
>

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Andrew Gierth

Date:

24 March 2015, 16:47:40

>>>>> "Jeff" == Jeff Anton <antonpgsql@hesiod.org> writes:
Jeff> Postgresql seems to be using the first interpretation andJeff> reporting fewer digits.  I've noticed this with
pg_dump. That aJeff> dump and restore of floating point values does not produce theJeff> same floating point values.
Tome, that is inexcusable.  UsingJeff> the -Fc format, real values are preserved.  I have a largeJeff> database of
securityprices.  I want accuracy above all.
 

-Fc doesn't do anything different with floats than text dumps do.

pg_dump (in all modes) uses the extra_float_digits setting to try and
get a value that restores exactly. There have been issues with this in
the past (the limit of extra_float_digits had to be raised from 2 to 3,
iirc), but if it's happening now then that should probably be considered
a bug.

-- 
Andrew (irc:RhodiumToad)

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

24 March 2015, 19:08:33

On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:
> On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
> > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > > > > This "junk" digit zeroing matches the Oracle behavior:
> > > > > 
> > > > >     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from
dual;
> > > > >     ------
> > > > >     1.1234567891234568000000000000000000000
> > > > > 
> > > > > Our output with the patch would be:
> > > > > 
> > > > >     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
> > > > >     ------
> > > > >     1.1234567891234500000000000000000000000
> > 
> > > > These outputs show Oracle treating 17 digits as significant while PostgreSQL
> > > > treats 15 digits as significant.  Should we match Oracle in this respect while
> > > > we're breaking compatibility anyway?  I tend to think yes.
> > > 
> > > Uh, I am hesistant to adjust our precision to match Oracle as I don't
> > > know what they are using internally.
> > 
> > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
> > float8 and 9 digits for float4.
> 
> OK, I am fine in using those values if you can find them as compiler
> defines, but I don't see how we can grab those values from a user test
> on Oracle.
> 
> There are some "invisible" float digits that don't appear in %f but can
> be shown if desired --- I think we used to do that in the regression
> tests, but found they added too much platform-specific randomness.  Do
> we want to go in that direction?

How about if we have to_char() honor our extra_float_digits GUC, so
users who want those digits can get them?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: Zero-padding and zero-masking fixes for to_char(float)

From

"Baker, Keith [OCDUS Non-J&J]"

Date:

24 March 2015, 21:19:22

I like the idea of using extra_float_digits  (which I always set this to 3 to ensure I don't lose precision).
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1251   Theorem 15   "When a binary IEEE single precision
numberis converted to the closest eight digit decimal number, it is not always possible to uniquely recover the binary
numberfrom the decimal one. However, if nine decimal digits are used, then converting the decimal number to the closest
binarynumber will recover the original floating-point number." 

Keith Baker

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Bruce Momjian
> Sent: Tuesday, March 24, 2015 3:08 PM
> To: Noah Misch
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Zero-padding and zero-masking fixes for
> to_char(float)
>
> On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:
> > On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
> > > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> > > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> > > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > > > > > This "junk" digit zeroing matches the Oracle behavior:
> > > > > >
> > > > > >     SELECT to_char(1.123456789123456789123456789d,
> '9.9999999999999999999999999999999999999') as x from dual;
> > > > > >     ------
> > > > > >     1.1234567891234568000000000000000000000
> > > > > >
> > > > > > Our output with the patch would be:
> > > > > >
> > > > > >     SELECT to_char(float8 '1.123456789123456789123456789',
> '9.9999999999999999999999999999999999999');
> > > > > >     ------
> > > > > >     1.1234567891234500000000000000000000000
> > >
> > > > > These outputs show Oracle treating 17 digits as significant
> > > > > while PostgreSQL treats 15 digits as significant.  Should we
> > > > > match Oracle in this respect while we're breaking compatibility
> anyway?  I tend to think yes.
> > > >
> > > > Uh, I am hesistant to adjust our precision to match Oracle as I
> > > > don't know what they are using internally.
> > >
> > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant
> > > digits for
> > > float8 and 9 digits for float4.
> >
> > OK, I am fine in using those values if you can find them as compiler
> > defines, but I don't see how we can grab those values from a user test
> > on Oracle.
> >
> > There are some "invisible" float digits that don't appear in %f but
> > can be shown if desired --- I think we used to do that in the
> > regression tests, but found they added too much platform-specific
> > randomness.  Do we want to go in that direction?
>
> How about if we have to_char() honor our extra_float_digits GUC, so users
> who want those digits can get them?
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + Everyone has their own god. +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Noah Misch

Date:

25 March 2015, 05:28:08

On Tue, Mar 24, 2015 at 09:26:10AM -0700, Jeff Anton wrote:
> The issue of significant (decimal) digits to and from floating point
> representation is a complex one.
> 
> What 'significant' means may depend upon the intent.

True.  I meant simply that Oracle TO_CHAR emits no more than 17 nonzero digits
for a BINARY_DOUBLE and no more than 9 nonzero digits for a BINARY_FLOAT.

> BTW:  This is my first posting to this list.  I should introduce myself.
> I'm Jeff Anton.  I was the first Postgres project lead programmer working
> for Michael Stonebraker at U.C. Berkeley a very long time ago.
> The first version was never released.  I've since worked for several db
> companies.

Welcome back.

Thanks,
nm

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Noah Misch

Date:

25 March 2015, 05:53:31

On Tue, Mar 24, 2015 at 03:08:26PM -0400, Bruce Momjian wrote:
> On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:
> > On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
> > > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> > > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> > > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > > > > > This "junk" digit zeroing matches the Oracle behavior:
> > > > > > 
> > > > > >     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from
dual;
> > > > > >     ------
> > > > > >     1.1234567891234568000000000000000000000
> > > > > > 
> > > > > > Our output with the patch would be:
> > > > > > 
> > > > > >     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
> > > > > >     ------
> > > > > >     1.1234567891234500000000000000000000000
> > > 
> > > > > These outputs show Oracle treating 17 digits as significant while PostgreSQL
> > > > > treats 15 digits as significant.  Should we match Oracle in this respect while
> > > > > we're breaking compatibility anyway?  I tend to think yes.
> > > > 
> > > > Uh, I am hesistant to adjust our precision to match Oracle as I don't
> > > > know what they are using internally.
> > > 
> > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
> > > float8 and 9 digits for float4.
> > 
> > OK, I am fine in using those values if you can find them as compiler
> > defines, but I don't see how we can grab those values from a user test
> > on Oracle.

We encounter no authority higher than the test results, so it would be wrong
to seek out and use a define that just happens to match a test result.  Adding
"#define TO_CHAR_DBL_DIG 17" and "#define TO_CHAR_FLT_DIG 9" is good.

> > There are some "invisible" float digits that don't appear in %f but can
> > be shown if desired --- I think we used to do that in the regression
> > tests, but found they added too much platform-specific randomness.  Do
> > we want to go in that direction?

Bare %f simply prints all digits before the decimal point and exactly six
digits after the decimal point.  Whether implementation-defined digits appear
in that output depends on the number's magnitude.  However, float8out and
float4out do behave along the lines of your description.

I do recommend pushing TO_CHAR in that direction, to make it more like Oracle
while we're already breaking compatibility with PostgreSQL 9.4.

> How about if we have to_char() honor our extra_float_digits GUC, so
> users who want those digits can get them?

It's not my first choice; for one thing, no value of extra_float_digits would
yield 17 digits for float8 and 9 digits for float4.

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

01 April 2015, 15:48:48

On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:
> On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:
> > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:
> > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> > > > This "junk" digit zeroing matches the Oracle behavior:
> > > >
> > > >     SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
> > > >     ------
> > > >     1.1234567891234568000000000000000000000
> > > >
> > > > Our output with the patch would be:
> > > >
> > > >     SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
> > > >     ------
> > > >     1.1234567891234500000000000000000000000
>
> > > These outputs show Oracle treating 17 digits as significant while PostgreSQL
> > > treats 15 digits as significant.  Should we match Oracle in this respect while
> > > we're breaking compatibility anyway?  I tend to think yes.
> >
> > Uh, I am hesistant to adjust our precision to match Oracle as I don't
> > know what they are using internally.
>
> http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
> float8 and 9 digits for float4.

I was able to get proper rounding with the attached patch.

    test=> SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
                     to_char
    ------------------------------------------
      1.1234567891234600000000000000000000000
    (1 row)

Handling rounding for exponent-format values turned out to be simple.
What has me stuck now is how to do rounding in the non-decimal part of
the number, e.g.

    test=> SELECT to_char(float4 '15555555555555.912345678912345678900000000000000000000000',
              repeat('9', 50) || '.' || repeat('9', 50));
                                                    to_char
    --------------------------------------------------------------------------------------------------------
                                          15555555753984.00000000000000000000000000000000000000000000000000
    (1 row)

This should return something like 15555600000000.000... (per Oracle
output at the URL above, float4 has 6 significant digits on my compiler)
but I can't seem to figure how to get printf() to round non-fractional
parts.  I am afraid the only solution is to use printf's %e format and
place the decimal point myself.

The fact I still don't have a complete solution suggests this is 9.6
material but I still want to work on it so it is ready.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

to_char.diff

Re: Zero-padding and zero-masking fixes for to_char(float)

From

Bruce Momjian

Date:

02 April 2015, 18:14:54

On Wed, Apr  1, 2015 at 11:48:37AM -0400, Bruce Momjian wrote:
> This should return something like 15555600000000.000... (per Oracle
> output at the URL above, float4 has 6 significant digits on my compiler)
> but I can't seem to figure how to get printf() to round non-fractional
> parts.  I am afraid the only solution is to use printf's %e format and
> place the decimal point myself.

Hearing nothing, I went with the %e approach;  patch attached.  The new
output looks right:

    test=> SELECT to_char(float4 '15555555555555.912345678912345678900000000000000000000000',
              repeat('9', 50) || '.' || repeat('9', 50));
                                                    to_char
    --------------------------------------------------------------------------------------------------------
                                          15555600000000.00000000000000000000000000000000000000000000000000
    (1 row)

> The fact I still don't have a complete solution suggests this is 9.6
> material but I still want to work on it so it is ready.

I will keep this patch for 9.6 unless I hear otherwise.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

to_char.diff