Thread: Zero-padding and zero-masking fixes for to_char(float)
In September, while researching the to_char() buffer overflow bugs fixed in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an inconsistency in how to_char() does zero-padding for float4/8 values. Now that 9.4.1 is released and I am home for a while, I am ready to address this. For example, to_char(int4) properly pads with trailing zeros, e.g. SELECT to_char(int4 '1999999999', '9999999999999999D' || repeat('9', 1000)); ------ 1999999999.000000000000000000000000000000... Numeric does the same thing: SELECT to_char(numeric '99999999999', '9999999999999999D' || repeat('9', 1000)); ------ 99999999999.00000000000000000000000000000... However, float4/8 do not supply the requested zero padding: SELECT to_char(float4 '99999999999', '9999999999999999D' || repeat('9', 1000)); ------ 99999997952 and SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000)); ------ 99999999999.0000 float4/8 are padding to the internal precision, while int4/numeric are padding based on the requested precision. This is inconsistent. The first attached patch fixes this, and also zeros the "junk" digits which exceed the precision of the underlying type: SELECT to_char(float4 '99999999999', '9999999999999999D' || repeat('9', 1000)); ------ 99999900000.00000000000000000000000000000... SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000)); ------ 99999999999.0000000000000000000000000000.... This "junk" digit zeroing matches the Oracle behavior: SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; ------ 1.1234567891234568000000000000000000000 Our output with the patch would be: SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); ------ 1.1234567891234500000000000000000000000 which is pretty close. The second patch adds regression tests for these. I would like to apply this for 9.5 while I remember what I was doing, but I guess now that I have written this email, I will be able to keep it for 9.6 if people prefer. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > In September, while researching the to_char() buffer overflow bugs fixed > in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an > inconsistency in how to_char() does zero-padding for float4/8 values. > Now that 9.4.1 is released and I am home for a while, I am ready to > address this. ... > float4/8 are padding to the internal precision, while int4/numeric are > padding based on the requested precision. This is inconsistent. > > The first attached patch fixes this, and also zeros the "junk" digits > which exceed the precision of the underlying type: Patch applied. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 22 March 2015 at 14:46, Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:
> In September, while researching the to_char() buffer overflow bugs fixed
> in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
> inconsistency in how to_char() does zero-padding for float4/8 values.
> Now that 9.4.1 is released and I am home for a while, I am ready to
> address this.
...
> float4/8 are padding to the internal precision, while int4/numeric are
> padding based on the requested precision. This is inconsistent.
>
> The first attached patch fixes this, and also zeros the "junk" digits
> which exceed the precision of the underlying type:
Patch applied.
This seems to have broken jacana. Looks like MSVC by default has a 3 digit exponent.
Going by this: https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems that it can quite easily be set back to 2.
I've attached a patch which seems to fix the issue.
Regards
David Rowley
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > This seems to have broken jacana. Looks like MSVC by default has a 3 digit > exponent. jacana was broken before this patch; but some other Windows critters are now unhappy as well. > Going by this: > https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems > that it can quite easily be set back to 2. > I've attached a patch which seems to fix the issue. That seems likely to have side-effects far beyond what's appropriate. We have gone out of our way to accommodate 3-digit exponents in other tests. What I want to know is why this patch created a 3-digit output where there was none before. regards, tom lane
On Sun, Mar 22, 2015 at 01:42:56AM -0400, Tom Lane wrote: > David Rowley <dgrowleyml@gmail.com> writes: > > This seems to have broken jacana. Looks like MSVC by default has a 3 digit > > exponent. > > jacana was broken before this patch; but some other Windows critters > are now unhappy as well. > > > Going by this: > > https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems > > that it can quite easily be set back to 2. > > > I've attached a patch which seems to fix the issue. > > That seems likely to have side-effects far beyond what's appropriate. > We have gone out of our way to accommodate 3-digit exponents in other > tests. What I want to know is why this patch created a 3-digit output > where there was none before. I was wondering the same thing too, but when I grep'ed the regression output files looking for exponents, I found float4-exp-three-digits.out and int8-exp-three-digits.out. I think this means we have had this issue before, and we are going to have to either create a numeric-exp-three-digits.out alternate output file, move the test to one of the existing exp-three-digits files, or remove the tests. Sorry, I didn't expect any problems with this patch as it was so small and localized. What has me more concerned is the Solaris 10 failure. This query: SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000)); expects this: 99999999999.00000000000... but on Solaris 10 gets this: .000000000000000000 Yes, the nines are gone, and only this query is failing. Oddly, this query did not fail, though the only difference is fewer decimal digits: SELECT to_char(float8 '99999999999', '9999999999999999D99999999'); This smells like a libc bug, e.g. OmniOS 5.11 passed the test. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote: > What has me more concerned is the Solaris 10 failure. This query: > > SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000)); > > expects this: > > 99999999999.00000000000... > > but on Solaris 10 gets this: > > .000000000000000000 > > Yes, the nines are gone, and only this query is failing. Oddly, this > query did not fail, though the only difference is fewer decimal digits: > > SELECT to_char(float8 '99999999999', '9999999999999999D99999999'); > > This smells like a libc bug, e.g. OmniOS 5.11 passed the test. Use of the "f" conversion specifier with precision greater than 512 is not portable; I get a similar diff on AIX 7. Until this patch, PostgreSQL would not use arbitrarily-large precisions on that conversion specifier. (Who would guess, but the "e" conversion specifier is not affected.) I recommend adding a "configure" test to use our snprintf.c replacements if sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.
When you posted this, I made a note to review it. On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > This "junk" digit zeroing matches the Oracle behavior: > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > ------ > 1.1234567891234568000000000000000000000 > > Our output with the patch would be: > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > ------ > 1.1234567891234500000000000000000000000 > > which is pretty close. PostgreSQL 9.4 returns "1.12345678912346". Your patch truncates digits past DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG digits. PostgreSQL must continue to round. These outputs show Oracle treating 17 digits as significant while PostgreSQL treats 15 digits as significant. Should we match Oracle in this respect while we're breaking compatibility anyway? I tend to think yes. > *************** int4_to_char(PG_FUNCTION_ARGS) > *** 5214,5221 **** > /* we can do it easily because float8 won't lose any precision */ > float8 val = (float8) value; > > ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); > ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val); > > /* > * Swap a leading positive sign for a space. > --- 5207,5213 ---- > /* we can do it easily because float8 won't lose any precision */ > float8 val = (float8) value; > > ! orgnum = psprintf("%+.*e", Num.post, val); Neither the submission notes nor the commit messages mentioned it, but this improves behavior for to_char(int4, text). Specifically, it helps EEEE formats with more than about 500 decimal digits: SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));
On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: > On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote: > > What has me more concerned is the Solaris 10 failure. This query: > > > > SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000)); > > > > expects this: > > > > 99999999999.00000000000... > > > > but on Solaris 10 gets this: > > > > .000000000000000000 > > > > Yes, the nines are gone, and only this query is failing. Oddly, this > > query did not fail, though the only difference is fewer decimal digits: > > > > SELECT to_char(float8 '99999999999', '9999999999999999D99999999'); > > > > This smells like a libc bug, e.g. OmniOS 5.11 passed the test. > > Use of the "f" conversion specifier with precision greater than 512 is not > portable; I get a similar diff on AIX 7. Until this patch, PostgreSQL would > not use arbitrarily-large precisions on that conversion specifier. (Who would > guess, but the "e" conversion specifier is not affected.) Yeah. > I recommend adding a "configure" test to use our snprintf.c replacements if > sprintf("%.*f", 65536, 99999999999.0) gives unexpected output. Do we really want to go to our /port snprintf just to handle 512+ digits? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > When you posted this, I made a note to review it. > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > This "junk" digit zeroing matches the Oracle behavior: > > > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > > ------ > > 1.1234567891234568000000000000000000000 > > > > Our output with the patch would be: > > > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > > ------ > > 1.1234567891234500000000000000000000000 > > > > which is pretty close. > > PostgreSQL 9.4 returns "1.12345678912346". Your patch truncates digits past > DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG > digits. PostgreSQL must continue to round. Ah, that rounding is a big problem. I can't just round because if the digit to be rounded up is '9', I have to set that to zero and increment the previous digit, and that could cascade and shift the entire string one over. I think I have to go back to the original code that does the weird computations to figure out how many digits are on the left of the decimal point, then set the format string after. I was hoping my patch could clean that all up, but we now need snprintf to do that rounding for us. :-( > These outputs show Oracle treating 17 digits as significant while PostgreSQL > treats 15 digits as significant. Should we match Oracle in this respect while > we're breaking compatibility anyway? I tend to think yes. Uh, I am hesistant to adjust our precision to match Oracle as I don't know what they are using internally. > > *************** int4_to_char(PG_FUNCTION_ARGS) > > *** 5214,5221 **** > > /* we can do it easily because float8 won't lose any precision */ > > float8 val = (float8) value; > > > > ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); > > ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val); > > > > /* > > * Swap a leading positive sign for a space. > > --- 5207,5213 ---- > > /* we can do it easily because float8 won't lose any precision */ > > float8 val = (float8) value; > > > > ! orgnum = psprintf("%+.*e", Num.post, val); > > Neither the submission notes nor the commit messages mentioned it, but this > improves behavior for to_char(int4, text). Specifically, it helps EEEE > formats with more than about 500 decimal digits: > > SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE')); Wow, that is surprising. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > > > *************** int4_to_char(PG_FUNCTION_ARGS) > > > *** 5214,5221 **** > > > /* we can do it easily because float8 won't lose any precision */ > > > float8 val = (float8) value; > > > > > > ! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1); > > > ! snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val); > > > > > > /* > > > * Swap a leading positive sign for a space. > > > --- 5207,5213 ---- > > > /* we can do it easily because float8 won't lose any precision */ > > > float8 val = (float8) value; > > > > > > ! orgnum = psprintf("%+.*e", Num.post, val); > > > > Neither the submission notes nor the commit messages mentioned it, but this > > improves behavior for to_char(int4, text). Specifically, it helps EEEE > > formats with more than about 500 decimal digits: > > > > SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE')); > > Wow, that is surprising. Ah, yes. There was a problem where were clipping off int4 EEEE output to MAXDOUBLEWIDTH, which made no sense, so there was the fix for that as well. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: >> I recommend adding a "configure" test to use our snprintf.c replacements if >> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output. > Do we really want to go to our /port snprintf just to handle 512+ > digits? I'd rather not go that direction (that is, to using a configure test). It assumes that all copies of libc on a particular platform behave the same, which seems like a bad bet to me. I think we'd be better off to avoid asking libc to do anything that might not work everywhere. On the other hand, this line of thought might end up having you reimplement in formatting.c the same logic I put into snprintf.c recently, which seems a bit silly. regards, tom lane
On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: > >> I recommend adding a "configure" test to use our snprintf.c replacements if > >> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output. > > > Do we really want to go to our /port snprintf just to handle 512+ > > digits? > > I'd rather not go that direction (that is, to using a configure test). > It assumes that all copies of libc on a particular platform behave the > same, which seems like a bad bet to me. I think we'd be better off to > avoid asking libc to do anything that might not work everywhere. > > On the other hand, this line of thought might end up having you > reimplement in formatting.c the same logic I put into snprintf.c > recently, which seems a bit silly. We already assume that a positive PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER, PGAC_FUNC_SNPRINTF_ARG_CONTROL and PGAC_FUNC_SNPRINTF_SIZE_T_SUPPORT will hold forever. While one can construct counterexamples, I don't recall much push to change that. This %1000f problem evades casual testing and will fade from hacker memory. We're liable to introduce similar bugs if everyone hacking high level code like src/backend/utils/adt needs to remember it. A potential compromise is to verify the configure-detected behavior during startup_hacks(). The DBA will see clearly when the runtime libraries exhibit a bug absent at build time.
On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote: > >> I recommend adding a "configure" test to use our snprintf.c replacements if > >> sprintf("%.*f", 65536, 99999999999.0) gives unexpected output. > > > Do we really want to go to our /port snprintf just to handle 512+ > > digits? > > I'd rather not go that direction (that is, to using a configure test). > It assumes that all copies of libc on a particular platform behave the > same, which seems like a bad bet to me. I think we'd be better off to > avoid asking libc to do anything that might not work everywhere. > > On the other hand, this line of thought might end up having you > reimplement in formatting.c the same logic I put into snprintf.c > recently, which seems a bit silly. If we can't trust libc for 512+ precision (and I don't think a startup check is warranted), I think we should either use our internal snprintf for to_char() in all cases, or in cases where the precision is 512+. Of course, this hinges on the assumption that only to_char() cares about 512+ digits --- if not, a startup check seems a requirement. However, even if we have a working snprintf, the bigger problem is having to do rounding, e.g. this is the worst case: SELECT to_char(float4 '5.5555555555555555555555555555', '9D' || repeat('9', 1000) || 'EEEE'); What we have to do here is to round to the specified precision, i.e. we can't just pad with zeros, which was my original approach. I am afraid that means we have to keep the odd coding where we call snprintf with zero decimal digits, get its length, then figure out the decimal precision, then call snprintf again to format the string with rounding. Then, we need to separate the mantissa from the exponent and add the desired number of zeros and copy it into a new longer string. So, instead of the code being cleaner, it will be even more messy, and will duplicate some of what we already do in our port/snprintf.c. Unfortunately, there is no way to call snprintf and tell it we want a specific number of decimal digits _and_ pad the rest with zeros. I think trying to do the rounding on the output string will never work well, e.g. 9.9999 rounded to one decimal digit is 10.0. Yuck. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > > This "junk" digit zeroing matches the Oracle behavior: > > > > > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > > > ------ > > > 1.1234567891234568000000000000000000000 > > > > > > Our output with the patch would be: > > > > > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > > > ------ > > > 1.1234567891234500000000000000000000000 > > These outputs show Oracle treating 17 digits as significant while PostgreSQL > > treats 15 digits as significant. Should we match Oracle in this respect while > > we're breaking compatibility anyway? I tend to think yes. > > Uh, I am hesistant to adjust our precision to match Oracle as I don't > know what they are using internally. http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for float8 and 9 digits for float4.
On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > > > This "junk" digit zeroing matches the Oracle behavior: > > > > > > > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > > > > ------ > > > > 1.1234567891234568000000000000000000000 > > > > > > > > Our output with the patch would be: > > > > > > > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > > > > ------ > > > > 1.1234567891234500000000000000000000000 > > > > These outputs show Oracle treating 17 digits as significant while PostgreSQL > > > treats 15 digits as significant. Should we match Oracle in this respect while > > > we're breaking compatibility anyway? I tend to think yes. > > > > Uh, I am hesistant to adjust our precision to match Oracle as I don't > > know what they are using internally. > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for > float8 and 9 digits for float4. OK, I am fine in using those values if you can find them as compiler defines, but I don't see how we can grab those values from a user test on Oracle. There are some "invisible" float digits that don't appear in %f but can be shown if desired --- I think we used to do that in the regression tests, but found they added too much platform-specific randomness. Do we want to go in that direction? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
The issue of significant (decimal) digits to and from floating point representation is a complex one. What 'significant' means may depend upon the intent. There are (at least) two different tests which may need to be used. * How many digits can be stored and then accurately returned? or * How many decimal digits are needed to recreate a floating point value? Or in longer form, if you have a floating pointvalue, you may want to print it in decimal form and then later scan that to recreate the exact bit pattern from the original floating point value. How many decimal digits do you need? The first question produces a smaller number of digits then the second one! The idea of zero padding is, IMO, a bad idea all together. It makes people feel better, but it adds inaccuracy. I've lost this interpretation so many times now that I only mention it for the real number geeks out there. Postgresql seems to be using the first interpretation and reporting fewer digits. I've noticed this with pg_dump. That a dump and restore of floating point values does not produce the same floating point values. To me, that is inexcusable. Using the -Fc format, real values are preserved. I have a large database of security prices. I want accuracy above all. I do not have time right now to produce the needed evidence for all these cases of floating point values. If there is interest I can produce this in a day or so. Jeff Anton BTW: This is my first posting to this list. I should introduce myself. I'm Jeff Anton. I was the first Postgres project lead programmer working for Michael Stonebraker at U.C. Berkeley a very long time ago. The first version was never released. I've since worked for several db companies. On 03/24/15 06:47, Noah Misch wrote: > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: >> On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: >>> On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: >>>> This "junk" digit zeroing matches the Oracle behavior: >>>> >>>> SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; >>>> ------ >>>> 1.1234567891234568000000000000000000000 >>>> >>>> Our output with the patch would be: >>>> >>>> SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); >>>> ------ >>>> 1.1234567891234500000000000000000000000 > >>> These outputs show Oracle treating 17 digits as significant while PostgreSQL >>> treats 15 digits as significant. Should we match Oracle in this respect while >>> we're breaking compatibility anyway? I tend to think yes. >> >> Uh, I am hesistant to adjust our precision to match Oracle as I don't >> know what they are using internally. > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for > float8 and 9 digits for float4. > >
>>>>> "Jeff" == Jeff Anton <antonpgsql@hesiod.org> writes: Jeff> Postgresql seems to be using the first interpretation andJeff> reporting fewer digits. I've noticed this with pg_dump. That aJeff> dump and restore of floating point values does not produce theJeff> same floating point values. Tome, that is inexcusable. UsingJeff> the -Fc format, real values are preserved. I have a largeJeff> database of securityprices. I want accuracy above all. -Fc doesn't do anything different with floats than text dumps do. pg_dump (in all modes) uses the extra_float_digits setting to try and get a value that restores exactly. There have been issues with this in the past (the limit of extra_float_digits had to be raised from 2 to 3, iirc), but if it's happening now then that should probably be considered a bug. -- Andrew (irc:RhodiumToad)
On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote: > On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: > > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > > > > This "junk" digit zeroing matches the Oracle behavior: > > > > > > > > > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > > > > > ------ > > > > > 1.1234567891234568000000000000000000000 > > > > > > > > > > Our output with the patch would be: > > > > > > > > > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > > > > > ------ > > > > > 1.1234567891234500000000000000000000000 > > > > > > These outputs show Oracle treating 17 digits as significant while PostgreSQL > > > > treats 15 digits as significant. Should we match Oracle in this respect while > > > > we're breaking compatibility anyway? I tend to think yes. > > > > > > Uh, I am hesistant to adjust our precision to match Oracle as I don't > > > know what they are using internally. > > > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for > > float8 and 9 digits for float4. > > OK, I am fine in using those values if you can find them as compiler > defines, but I don't see how we can grab those values from a user test > on Oracle. > > There are some "invisible" float digits that don't appear in %f but can > be shown if desired --- I think we used to do that in the regression > tests, but found they added too much platform-specific randomness. Do > we want to go in that direction? How about if we have to_char() honor our extra_float_digits GUC, so users who want those digits can get them? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Re: Zero-padding and zero-masking fixes for to_char(float)
From
"Baker, Keith [OCDUS Non-J&J]"
Date:
I like the idea of using extra_float_digits (which I always set this to 3 to ensure I don't lose precision). http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1251 Theorem 15 "When a binary IEEE single precision numberis converted to the closest eight digit decimal number, it is not always possible to uniquely recover the binary numberfrom the decimal one. However, if nine decimal digits are used, then converting the decimal number to the closest binarynumber will recover the original floating-point number." Keith Baker > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Bruce Momjian > Sent: Tuesday, March 24, 2015 3:08 PM > To: Noah Misch > Cc: PostgreSQL-development > Subject: Re: [HACKERS] Zero-padding and zero-masking fixes for > to_char(float) > > On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote: > > On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: > > > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > > > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > > > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > > > > > This "junk" digit zeroing matches the Oracle behavior: > > > > > > > > > > > > SELECT to_char(1.123456789123456789123456789d, > '9.9999999999999999999999999999999999999') as x from dual; > > > > > > ------ > > > > > > 1.1234567891234568000000000000000000000 > > > > > > > > > > > > Our output with the patch would be: > > > > > > > > > > > > SELECT to_char(float8 '1.123456789123456789123456789', > '9.9999999999999999999999999999999999999'); > > > > > > ------ > > > > > > 1.1234567891234500000000000000000000000 > > > > > > > > These outputs show Oracle treating 17 digits as significant > > > > > while PostgreSQL treats 15 digits as significant. Should we > > > > > match Oracle in this respect while we're breaking compatibility > anyway? I tend to think yes. > > > > > > > > Uh, I am hesistant to adjust our precision to match Oracle as I > > > > don't know what they are using internally. > > > > > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant > > > digits for > > > float8 and 9 digits for float4. > > > > OK, I am fine in using those values if you can find them as compiler > > defines, but I don't see how we can grab those values from a user test > > on Oracle. > > > > There are some "invisible" float digits that don't appear in %f but > > can be shown if desired --- I think we used to do that in the > > regression tests, but found they added too much platform-specific > > randomness. Do we want to go in that direction? > > How about if we have to_char() honor our extra_float_digits GUC, so users > who want those digits can get them? > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + Everyone has their own god. + > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 24, 2015 at 09:26:10AM -0700, Jeff Anton wrote: > The issue of significant (decimal) digits to and from floating point > representation is a complex one. > > What 'significant' means may depend upon the intent. True. I meant simply that Oracle TO_CHAR emits no more than 17 nonzero digits for a BINARY_DOUBLE and no more than 9 nonzero digits for a BINARY_FLOAT. > BTW: This is my first posting to this list. I should introduce myself. > I'm Jeff Anton. I was the first Postgres project lead programmer working > for Michael Stonebraker at U.C. Berkeley a very long time ago. > The first version was never released. I've since worked for several db > companies. Welcome back. Thanks, nm
On Tue, Mar 24, 2015 at 03:08:26PM -0400, Bruce Momjian wrote: > On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote: > > On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: > > > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > > > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > > > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > > > > > This "junk" digit zeroing matches the Oracle behavior: > > > > > > > > > > > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > > > > > > ------ > > > > > > 1.1234567891234568000000000000000000000 > > > > > > > > > > > > Our output with the patch would be: > > > > > > > > > > > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > > > > > > ------ > > > > > > 1.1234567891234500000000000000000000000 > > > > > > > > These outputs show Oracle treating 17 digits as significant while PostgreSQL > > > > > treats 15 digits as significant. Should we match Oracle in this respect while > > > > > we're breaking compatibility anyway? I tend to think yes. > > > > > > > > Uh, I am hesistant to adjust our precision to match Oracle as I don't > > > > know what they are using internally. > > > > > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for > > > float8 and 9 digits for float4. > > > > OK, I am fine in using those values if you can find them as compiler > > defines, but I don't see how we can grab those values from a user test > > on Oracle. We encounter no authority higher than the test results, so it would be wrong to seek out and use a define that just happens to match a test result. Adding "#define TO_CHAR_DBL_DIG 17" and "#define TO_CHAR_FLT_DIG 9" is good. > > There are some "invisible" float digits that don't appear in %f but can > > be shown if desired --- I think we used to do that in the regression > > tests, but found they added too much platform-specific randomness. Do > > we want to go in that direction? Bare %f simply prints all digits before the decimal point and exactly six digits after the decimal point. Whether implementation-defined digits appear in that output depends on the number's magnitude. However, float8out and float4out do behave along the lines of your description. I do recommend pushing TO_CHAR in that direction, to make it more like Oracle while we're already breaking compatibility with PostgreSQL 9.4. > How about if we have to_char() honor our extra_float_digits GUC, so > users who want those digits can get them? It's not my first choice; for one thing, no value of extra_float_digits would yield 17 digits for float8 and 9 digits for float4.
On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote: > On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote: > > On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote: > > > On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote: > > > > This "junk" digit zeroing matches the Oracle behavior: > > > > > > > > SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual; > > > > ------ > > > > 1.1234567891234568000000000000000000000 > > > > > > > > Our output with the patch would be: > > > > > > > > SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); > > > > ------ > > > > 1.1234567891234500000000000000000000000 > > > > These outputs show Oracle treating 17 digits as significant while PostgreSQL > > > treats 15 digits as significant. Should we match Oracle in this respect while > > > we're breaking compatibility anyway? I tend to think yes. > > > > Uh, I am hesistant to adjust our precision to match Oracle as I don't > > know what they are using internally. > > http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for > float8 and 9 digits for float4. I was able to get proper rounding with the attached patch. test=> SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999'); to_char ------------------------------------------ 1.1234567891234600000000000000000000000 (1 row) Handling rounding for exponent-format values turned out to be simple. What has me stuck now is how to do rounding in the non-decimal part of the number, e.g. test=> SELECT to_char(float4 '15555555555555.912345678912345678900000000000000000000000', repeat('9', 50) || '.' || repeat('9', 50)); to_char -------------------------------------------------------------------------------------------------------- 15555555753984.00000000000000000000000000000000000000000000000000 (1 row) This should return something like 15555600000000.000... (per Oracle output at the URL above, float4 has 6 significant digits on my compiler) but I can't seem to figure how to get printf() to round non-fractional parts. I am afraid the only solution is to use printf's %e format and place the decimal point myself. The fact I still don't have a complete solution suggests this is 9.6 material but I still want to work on it so it is ready. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On Wed, Apr 1, 2015 at 11:48:37AM -0400, Bruce Momjian wrote: > This should return something like 15555600000000.000... (per Oracle > output at the URL above, float4 has 6 significant digits on my compiler) > but I can't seem to figure how to get printf() to round non-fractional > parts. I am afraid the only solution is to use printf's %e format and > place the decimal point myself. Hearing nothing, I went with the %e approach; patch attached. The new output looks right: test=> SELECT to_char(float4 '15555555555555.912345678912345678900000000000000000000000', repeat('9', 50) || '.' || repeat('9', 50)); to_char -------------------------------------------------------------------------------------------------------- 15555600000000.00000000000000000000000000000000000000000000000000 (1 row) > The fact I still don't have a complete solution suggests this is 9.6 > material but I still want to work on it so it is ready. I will keep this patch for 9.6 unless I hear otherwise. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +